Christians Tagebuch: sql

The latest posts in full-text for feed readers.


Matomo: Extract download numbers for an URL with SQL

I piped the logs of 20 Apache domains for 6 years into a Matomo 3 instance, creating a database several gigabytes large. The Matomo UI is unusable because it is so slow, and some reports that I wish to have are impossible to get.

One of the things I wanted to know was how often a certain URL has been accessed in the last years, and that information should be grouped by month.

It turns out that this information is split onto two database tables:

  • log_action lists all URLs
  • log_link_visit_action tells us when URLs were accessed.

At first we need to find the ID of the URL we are looking for in log_action:

SELECT * FROM `log_action` WHERE `name` LIKE '%the-file-i-am-looking-for.apk'

This gives us the IDs in the idaction column. That one can be searched for in the log_link_visit_action table:

SELECT SUBSTRING(server_time, 1, 7) as date, COUNT(*) as count
FROM `log_link_visit_action`
WHERE `idaction_url` IN (1078305, 1103532, 1431512, 1432793)
GROUP BY date
ORDER BY date

And now we have the results:

datecount
2021-1212
2022-0121
2022-0226
2022-038

Before running the second SQL query, I created an index on the log_link_visit_action.idaction_url column - that would have been too slow otherwise.

Published on 2023-05-06 in


MariaDB on Debian: Restore root user

root users on mariadb-server 10.4+ no longer have a password because they authenticate via a unix socket instead of a TCP network connection.

On our new server, someone accidentially dropped that user and when running mysql as root on the shell, we had to enter the username and password. This is not nice when migrating databases.

The Debian README for mariadb-server says:

You may never ever delete the mysql user "root".

The solution is to restore that user:

GRANT ALL PRIVILEGES ON *.* TO root@localhost IDENTIFIED VIA unix_socket WITH GRANT OPTION;
FLUSH PRIVILEGES;

Published on 2023-03-01 in ,


Listing mysql.user table records in a terminal

When I'm setting up a MariaDB or MySQL server, there is no GUI - at least at first. But it is still the time to add new users and check why the login for that certain user fails.

The mysql.user table is the right place, but with its 46 columns the mysql cli client's output is unreadable.

But there is a trick: MySQL's command line client has a special "ego" command:

ego, \G
Send the current statement to the server to be executed and display the result using vertical format.

Let's try it:

$ mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 50
Server version: 10.1.37-MariaDB-3 Debian buildd-unstable

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> SELECT * FROM mysql.user LIMIT 1\G
*************************** 1. row ***************************
                  Host: localhost
                  User: root
              Password: 
           Select_priv: Y
           Insert_priv: Y
           Update_priv: Y
           Delete_priv: Y
           Create_priv: Y
             Drop_priv: Y
           Reload_priv: Y
         Shutdown_priv: Y
          Process_priv: Y
             File_priv: Y
            Grant_priv: Y
       References_priv: Y
            Index_priv: Y
            Alter_priv: Y
          Show_db_priv: Y
            Super_priv: Y
 Create_tmp_table_priv: Y
      Lock_tables_priv: Y
          Execute_priv: Y
       Repl_slave_priv: Y
      Repl_client_priv: Y
      Create_view_priv: Y
        Show_view_priv: Y
   Create_routine_priv: Y
    Alter_routine_priv: Y
      Create_user_priv: Y
            Event_priv: Y
          Trigger_priv: Y
Create_tablespace_priv: Y
              ssl_type: 
            ssl_cipher: 
           x509_issuer: 
          x509_subject: 
         max_questions: 0
           max_updates: 0
       max_connections: 0
  max_user_connections: 0
                plugin: unix_socket
 authentication_string: 
      password_expired: N
               is_role: N
          default_role: 
    max_statement_time: 0.000000
1 row in set (0.00 sec)

MariaDB [(none)]>

Published on 2019-01-10 in ,


Docker: Wait until MySQL is available

At work we're extensively using Docker to pack up web applications. After starting a container stack (at least web + database container), database tasks like migrations or cache clearing needs to be done.

Unfortunately, docker reports a MySQL/MariaDB database container as available when MySQL itself is not ready to take client connections. This means calling docker-compose up and right after that running the migration will fail because the appplication cannot connect to the database.

MySQL is not available immediately after starting the container, because at the beginning it either has to run initialization .sql scripts or load and check the existing database files, which takes time.

Up to now the solution was to sleep 30 and hope that this is long enough for MySQL to finish startup. But this also means wasted time when MySQL only takes 5 seconds.

These are the things I tried:

depends_on

In docker-compose.yml, you can say that the web container may be started only after the database container started with depends_on.

As described above, the container itself is running, but MySQL is not - so from docker's persepective depends_on is fulfilled after the container started, which does not help us.

mysqladmin ping

Some suggest to use mysqladmin ping to check if the database is available.

I found this to be not reliable; on my machines the ping succeeded but the clients still could not connect.

Client connection test

A couple of projects try to do a real MySQL client connection to check if the database is up, see Wordpress and Bonita. See the description in docker-library.

I ended up doing the same by adding a script to the MySQL container that can be started from outside:

/root/waitForMySQL.sh
 /dev/null 2>&1; do
    sleep 1
    counter=`expr $counter + 1`
    if [ $counter -gt $maxcounter ]; then
        >&2 echo "We have been waiting for MySQL too long already; failing."
        exit 1
    fi;
done]]>

This script is added in the Dockerfile to the default MySQL container:

Dockerfile
FROM mysql:5.7

ADD waitForMySQL.sh /root/
RUN chmod +x /root/waitForMySQL.sh

Now we can run it after running docker-compose up:

$ docker-compose -p projectprefix up
$ docker exec projectprefix_mysql /root/waitForMySQL.sh
$ docker exec projectprefix_web\
    su -s/bin/sh -c 'cd /var/www && ./artisan migrate'\
    www-data

Update 2018-11

It's better to use --protocol TCP to be sure that not only the socket connection is ready.

Published on 2018-03-06 in , ,


Crate: Eventual consistency

Some projects at work use a Crate database as kind of caching layer for aggregated, unnormalized data that helps reducing requests from frontend servers. Because of its eventual consistency model, filling data into it has some issues.

Eventual consistency means that at some point in the future - not now - data will be consistent. You write into Crate, but it is not guranteed that the next read will contain the data written earlier, as described in its documentation.

The following PHP code uses Laravel's Eloquent with the Crate adapter, and is pretty standard in the sloppy and inefficient world of nice-to-look-at ORMs:

$obj = Model::firstOrCreate(['id' => $id]);
$obj->fill($transformer->transform($data));
$obj->save();

That code handles both fresh inserts and updates in only three lines, which is nice. But it makes three SQL queries for new data:

  1. SELECT for checking if the row exists
  2. INSERT to create the row (saving the ID only)
  3. UPDATE to actually write the new data

With Crate, the UPDATE will fail because the row inserted in step 2 is not available in step 3.

Eloquent's firstOrCreate method fortunately supports a second parameter that may contain additional data to be inserted:

$newdata = $transformer->transform($data);
$obj = Model::firstOrCreate(['id' => $id], $newdata);
$obj->fill($newdata);
$obj->save();

This code works with Crate, even if it is terrible inefficient - when inserting a new row, it sends the same data again in the UPDATE.

Eventual consistency will also bite you when using TRUNCATE and then immediately inserting data. Create has a REFRESH TABLE command that ought to support you with such cases, but in Crate 1.0.1 we had issues with that, and had to resort to a sleep(1) call in our import script :/

Published on 2017-01-18 in ,


PHP-SQLlint: CLI syntax checker for SQL files

I wanted to let our CI system syntax-check SQL files, but failed to find a SQL syntax checker for the command line. There are many online tools to which you can copy&paste your SQL statements, but nobody wants this.

I did not find any cli tools for this task, apart from the 2005 SQLLint prototype written in Prolog.

Instead I found the SQL parser library used by phpMyAdmin and made a wrapper around it: php-sqllint.

It's a standalone .phar file that you can use to check the syntax of SQL files:

$ php-sqllint tests/files/create-missingcomma.sql
Checking SQL syntax of tests/files/create-missingcomma.sql
 Line 3, col 5 at "pid": A comma or a closing bracket was expected.
 Line 3, col 13 at "11": Unexpected beginning of statement.
 Line 3, col 17 at "DEFAULT": Unrecognized statement type.

It has an emacs output mode, and also lets you highlight SQL files on your shell (ANSI colors) and HTML.

$ echo 'SELECT title FROM tools WHERE domain="SQL" LIMIT 1'\
 | php-sqllint --format --highlight html -
SELECT
  title
FROM
  tools
WHERE
  domain = "SQL"
LIMIT 1

Download the .phar or have a look at the sources (github mirror).

Published on 2015-12-21 in ,


Fixing XML in databases with CLI tools

Recently I had to edit XML that was stored in columns of a MySQL database table. Instead of hacking a small PHP script, I chose to use a command line XML editing tool to master the task.

This article has originally been published on my employer's blog:
Fixing XML in databases with CLI tools @ netresearch .

The problem

In one of our TYPO3 projects we use Flux to add custom configuration options to page records. Those dynamic settings are stored, as usual in TYPO3, in an XML format called “FlexForms” which is then put into a column of the database table:



  
    
      
        0
        
          
            
              
            
            
              0
            
          
        
      
    
  

]]>

Now, due to some update in either TYPO3 itself or the Flux extension, the options did not get stored in the sDEF sheet anymore but in a new sheet options:



  
    
      
        
          
            
              
            
          
        
        0
      
    
    
      
        
          
        
        
          info
        
        
          
        
        
          0
        
      
    
  

]]>

TYPO3 does not remove old data when saving flexform fields, thus the old sDEF sheet as well as the new options sheet were both in the XML. Unfortunately, the TYPO3 API has a preference for sDEF - when it’s set, the values from that sheet are used:

This led to the situation that, although we changed the settings, they were not used by TYPO3 at all. The only way to fix it was to remove the sDEF sheet from the XML in the database columns tx_fed_page_flexform and tx_fed_page_flexform_sub of the pages table in the TYPO3 MySQL database.

Solution #1: A PHP script

A solution would have been to write a PHP script that connects to the TYPO3 database, fetches all records from the pages table, loads the flexform column data into a SimpleXML object, runs XPath on it, removes the node, re-serializes the XML and updates the database records.

This sounded like too much effort, given that I know that editing XML on the command line is a breeze with xmlstarlet.

Solution #2: mysqlfuse + xmlstarlet

XMLStarlet is a set of command line tools to edit and query XML files.

Removing the sDEF sheet node from an XML file is as simple as executing the following command:

The only question left was how to access the MySQL pages table with XMLStarlet.

FUSE and mysqlfuse

Linux has a mechanism called FUSE, the Filesystem in Userspace. With it, it’s possible to write user-space file system drivers that can expose about anything as a file system. FTPfs and SSHfs are examples, as well as WikipediaFS which allows you to read and edit wikipedia articles with a normal text editor.

There is also mysqlfuse, which is able to expose complete MySQL databases as a directory tree. Each record in a table is a directory, and each column is a file – exactly what I needed for my task.

Mounting the database

Mounting the MySQL database as file system was easy:

  1. Install python-fuse and python-mysqldb
  2. Download mysqlfuse:

    $ git clone https://github.com/clsn/mysqlfuse.git
  3. Mount your database:

Now I could list the tables:

And the pages table:

Every primary key is turned into a directory, uid is the only one in the pages table. Inside that directory, we have all records listed with their uid:

And each record directory exposes all columns as files:

Examining the contents of a column is as easy as reading it with cat:



  
    
[...]
]]>

Fixing the XML

With the mount in place, running XMLStarlet was simple:

 ~/tmp-flexdata;\
   cat ~/tmp-flexdata > $i\
  ); done
]]>

The shell command loops through all records with a tx_fed_page_flexform, checks if there is actual content in them (some records have no flexform options saved), edits and saves the resulting XML into a temporary file. The contents of the temp file are then written back to the column file.

I did the same for the tx_fed_page_flexform_sub column and was set.

A tiny bug

Examining the database, I noted that the XML in the flexform columns was not modified at all.

Debbuging the issue with wireshark revealed a bug: The python-mysqldb library had changed since mysqlfuse was written and now automatically disables the MySQL autocommit feature. Since mysqlfuse only executes the UPDATE SQL queries but never calls commit, the database never writes the changes back to disc.

A bugfix was quickly written, and now the columns were properly updated in the database.

Final words

Apart from the mysqlfuse bug I had to fix, the whole task was a breeze and much quicker done than writing a script in PHP or another language.

I’ll keep mysqlfuse in my toolbox for more tasks that can be solved with some unix command line tools and a bit of ingenuity.

Published on 2015-01-19 in , , ,


SPARQL Engines Benchmark Results

Some time ago I published results of benchmarks I did for my diploma thesis. Since then I had Virtuoso put on my list of competitors; further I added three new queries that are not as artificial as the previous ones as they try to resemble queries used in the wild.

The competitors compared are:

  • RAP's old SparqlEngine
  • RAP's new SparqlEngineDb I wrote as part of my thesis
  • ARC, another PHP implementation made for performance (2006-10-24)
  • Jena SDB, a semantic web framework written in Java (beta 1)
  • Redland, a C implementation. (1.0.6)
  • Virtuoso Open Source Edition 5.0.1

The tests have been run on a Athlon XP 1700+ with 1GiB of RAM. Both PHP and Java have been assigned 768MiB of RAM. MySQL in version 5.0.38 on a current Gentoo Linux has been used with PHP 5.2.3 and Java 1.5.0_11. All the libraries except Virtuoso support MySQL as storage, so this was used as backend.

I used the data generator of the Lehigh University Benchmark to get 200.000 RDF triples. Those triples were imported into a fresh MySQL database using the libraries' native tools.

To cut out times for loading the classes or parsing the php files, I created a script that included all the necessary files first and executed the queries ten times in a row against the lib. Taking the time between begin and return of the query function in milliseconds, I executed all queries against different database sizes: From 200.000, 100.000, 50.000, ... down to 5. All result data have been put into some nice diagrams.

Library notes

Jena needed some special care since the first run was always slow, probably because the JVM needed to load all the classes in the run. So Jena got a dry run and ten of which the times were taken afterwards.

ARC didn't like the ?o2 != ?o1 part and threw an error. The complex queries resulted in a "false" value returned after some milliseconds. I assume that something failed internally.

Redland has been used through its PHP bindings. While this seems to make it slower, I found out that it seems to have a bug in librdf_free_query_results() that causes delays up to 10 seconds depending on the dataset size. In my benchmark script, I did not call this method in order to give the lib some chance against the others. If I would have freed the results after each query, librdf would be second last.

Since I did not get the ODBC drivers working correctly, I used the isql program delivered with Virtuoso to benchmark the server. Virtuoso also had a bug in regex handling, so I have no timings for those queries.

Results

Legend The first set of SPARQL queries were chosen to be data independent and concentrate on a single SPARQL feature only. Three additional queries have been created to see how the engines act on complex queries found in the real world outside. The y axis is a logarithmic scaled time axis in seconds, x displays the number of records in the database.

SELECT

Cross joining all triples

Regular expressions

ORDER BY

Complex queries

  • 7 connected triples
    
    PREFIX test: 
    SELECT
     ?univ ?dpt ?prof ?assProf ?publ ?mailProf ?mailAssProf
    WHERE {
     ?dpt test:subOrganizationOf ?univ.
     ?prof test:worksFor ?dpt.
     ?prof rdf:type 
  • OPTIONAL
    
    PREFIX test: 
    SELECT ?publ ?p1 ?p1mail ?p2 ?p2mail ?p3 ?p3mail
    WHERE {
     ?publ test:publicationAuthor ?p1.
     ?p1 test:emailAddress ?p1mail
     OPTIONAL {
      ?publ test:publicationAuthor ?p2.
      ?p2 test:emailAddress ?p2mail.
      FILTER(?p1 != ?p2)
     }
     OPTIONAL {
      ?publ test:publicationAuthor ?p3.
      ?p3 test:emailAddress ?p3mail.
      FILTER(?p1 != ?p3 && ?p2 != ?p3)
     }
    }
    LIMIT 10
    ]]>
  • UNION
    
    PREFIX test: 
    SELECT ?prof ?email ?telephone
    WHERE {
     {
      ?prof rdf:type test:FullProfessor.
      ?prof test:emailAddress ?email.
      ?prof test:telephone ?telephone
     }
     UNION
     {
      ?prof rdf:type test:Lecturer.
      ?prof test:telephone ?telephone
     }
     UNION
     {
      ?prof rdf:type test:AssistantProfessor.
      ?prof test:emailAddress ?email
     }
    }
    LIMIT 10
    ]]>

Average timings

Conclusion

Jena has been the only engine beside SparqlEngineDb that executed all queries. ARC is not as fast as expected, and failed on nearly half of the queries. Redland is reasonable fast, although I expected it to gain more performance given that it is written in plain C. Virtuoso as the only commercially developed product is the fastest of all engines. But here and there, other engines have been faster which is nice to see :) And my SparqlEngineDb - I think it's pretty good, although the benchmark has shown enough points at which it can be made better and faster.

Published on 2007-10-05 in , , ,


SPARQLer's best choice: SparqlEngineDb

For my diploma thesis I ran some benchmarks to compare my SparqlEngineDb implementation to some other implementations. The competitors were:

  • RAP's old SparqlEngine
  • ARC, another PHP implementation made for performance (2006-10-24)
  • Jena SDB, a semantic web framework written in Java (alpha 2)
  • Redland, a C implementation. (1.0.6)

The tests have been run on a Athlon XP 1700+ with 1GiB of RAM. Both PHP and Java have been assigned 768MiB of RAM. MySQL in version 5.0.38 on a current Gentoo Linux has been used with PHP 5.2.2 and Java 1.5.0_11. All the libraries support MySQL as storage so this was used as backend.

Without going into the same details as I did for my thesis, some more information:

I used the data generator of the Lehigh University Benchmark to get 200.000 RDF triples. Those triples were imported into a fresh MySQL database using the libraries' native tools.

To cut out times for loading the classes or parsing the php files, I created a script that included all the necessary files first and executed the queries ten times in a row against the lib. Taking the time between begin and return of the query function in milliseconds, I executed all queries against different database sizes: From 200.000, 100.000, 50.000, ... down to 5. All result data have been put into some nice diagrams.

Library notes

Jena needed some special care since the first run was always slow, probably because the JVM needed to load all the classes in the run. So Jena got a dry run and ten of which the times were taken afterwards.

ARC didn't like the ?o2 != ?o1 part and threw an error.

Redland has been used through its PHP bindings. While this seems to make it slower, I found out that it seems to have a bug in librdf_free_query_results() that causes delays up to 10 seconds depending on the dataset size. In my benchmark script, I did not call this method in order to give the lib some chance against the others. If I would have freed the results after each query, librdf would be second last.

Results

Testing RAP's old engine and Jena only first, I was surprised to see that my engine is on average 10 times faster then Jena and 14 times faster than RAP's SparqlEngine. Seeing that I can take the competition I ran tests against ARC and Redland - and was surprised again. ARC said of itself that it is made for speed, using PHP arrays instead of objects. Reading this I took it for granted that my engine can't be faster. Next, Redland is completely written in C making it extremely fast - no chance for my lib to win against. The more I wondered to get a speedup of 7.7 against ARC and 3.3 against Redland!

The SPARQL queries were chosen to be data independent. The y axis is a logarithmic scaled time axis in seconds, x displays the number of records in the database.

Here are the diagrams: Legend, , , , , , , ,

As always with benchmarks, take them with a grain of salt. Don't believe any benchmarks you didn't fake yourself. With different queries, it might be that different results turn up. You see that Jena is even a bit better at this one simplest query when my engine needs to instantiate 1000 results - creating objects in PHP is slow, so this is the point your benchmarks can make SparqlEngineDb look slow.

Published on 2007-07-02 in , , ,


With SPARQLing eyes

Mid of november 2006 I finally found the theme for my diploma thesis: Taking RAP ( RDF API for PHP) and writing a better SPARQL engine that scales well on big models and operates directly on the database, instead of filtering and joining millions of triples in memory.

Slow beginnings

I began working on RAP in november, fixing small bugs that prevented RAP working on case-sensitive file systems and "short open tags" set to off, as well as some other outstanding bugs.

Mid of december, I had a first basic version of my SparqlEngineDb that could do basic SELECT statements with OPTIONAL clauses and LIMIT as well as OFFSET parts. I had nearly no time in the second half of december and the beginning of january 2007, since exams were showing their shadows..

At 18th of january, I got the existing unit tests for the memory SparqlEngine working unmodified for my DB engine. The first 10 or 15 of 140 unit tests passed - the most basic ones.

Specs & Order

Four days later, I had a crisis when trying to implement ORDER BY support that adheres fully to the specs. In SPARQL, result variables may consist of values of different categories and datatypes: Literals, Resources and Blank nodes, and strings, dateTime values, booleans and whatnot else. Now the standard explicitely tells you that blank nodes are to be sorted before IRIs which come before RDF literals. The different data types have also a specific order, and, if that was not enought, need to be casted depending on their RDF data type to get them sorted correctly in SQL (e.g. a "09" is greater than a "1", but you need to cast the value (that is stored as a blob) in mysql so it recognizes it). While this is easy for integers, you also have doubles, booleans and dateTime values. For each of them you need a different casting function - which brought me to the necessity of splitting query into multiple queries that only retrieve values of a certain datatype:

   SELECT t0.object FROM statements as t0 ORDER BY t0.object
  

needs to be split:

   SELECT t0.object FROM statements as t0
   WHERE t0.l_datatype = "http://www.w3.org/2001/XMLSchema#integer"
   ORDER BY CAST(t0.object as INT)
   
   SELECT t0.object FROM statements as t0
   WHERE t0.l_datatype = "http://www.w3.org/2001/XMLSchema#boolean"
   ORDER BY CAST(t0.object as BOOL)
   
   ... not to speak of datetime values
  

The most natural thing to do now is creating a huge UNION query and get all results at once. Wrong. UNION is a set operation, which means that the order of results is undefined! So I can order the single data as nicely as I want to, the result is unordered unless coincidence had the database's memory in a state to return ordered results. So my only option was to create distinct sql queries, send them one after another to the server and join the results on client side - not the best option performance-wise, but all the people I spoke with about that didn't have a better idea. (It is possible to use SORT BY outside the UNION clauses and sort by parts of the union, but that would require me to generate, as I called them in a CVS commit message, "queries of death".)

Now, having multiple queries returning data caused me to create workaround code for another part: OFFSET and LIMIT. While transforming SPARQL OFFSET and LIMIT clauses into SQL is trivial, it isn't anymore if your data are distributed over multiple result sets. Another class saw the light of my harddisk, the offset engine..

Preparing to be fast

Since my SparqlEngine is to be used in POWL (as base for OntoWiki), we had the first tests converting the powl api to use SPARQL instead of direct SQL calls - this allows switching data backends easily. One problem was performance: While speed greatly increased with my new database driven sparql engine, we still were way too slow to actually use OntoWiki properly - the powl API generates and executes up to some hundreds sparql queries to generate a single page, and parsing all those queries took quite some time.

Prepared statements are the way to go in such a case, and I went it. Currently, the SPARQL recommendation does not define anything in this direction, so I had to come up with a solution by myself. In a week, I had Prepared Statements For SPARQL implemented and working well.

The performance boost is dramatically: A simple query repeated 1000 times takes 3 instead of 12 seconds by using prepared statements, and this is without native prepared statements on database driver level. ADODB's mysqli driver currently does not support native prepared statements - with them, we will have another performance boost.

Filter

After DAWG, sort and limit test cases passed, it was time to go on to the filter code, one of the big features currently missing. After examining the filter code of the mem-based SparqlEngine I found out that it extracts the whole FILTER clause from a SPARQL query, does some regex on it and using evil eval() to execute it as pure PHP code. After five minutes I had a working exploit that deletes all files on the webserver a RAP SparqlEngine is running - there were no checks for validity or sanity of the regexe'd php code, it was just executed in the hope that nothing went wrong.

This approach works on PHP, but not on SQL - and I didn't want to open another barn-door wide hole by providing help for SQL injection attacts. So I sat down and extendet SparqlParser to fully parse FILTER clauses and put them into a nice tree. I tried multiple ways of getting the filter done the best way: Using a parser generator, writing it by hand by iterating over all characters, ... In the end, my do-it-yourself approach went into the same direction as the current parser was implemented, and I finally understand how it worked and why it had been writting that way.

In the coming weeks I actually implemented FILTER support and had them nearly fully working when I stumbled across UNIONs in the unit test's SPARQL queries. I had almost forgotten about them, but now I needed to implement them.. I thinkered if to implement a poor-man solution that would work on the most obvious cases, or doing a full-fledged version that would require changes in half of my code. After seeing that Powl needs to generate queries that would not work with the cheap way, I did the full work.

UNIONited in pleasure

Today, 28th of april 2007, I got the following line when running the unit tests for SparqlEngineDb:

   Test cases run: 1/1, Passes: 140, Failures: 0, Exceptions: 0
  

After two months of working now-and-then, and three months working nearly full-time on the engine, my SPARQL engine now passes all tests and implements the current specs fully. Yay!


My next and last task is to implement some extensions to SPARQL such as aggregation support. After this, I'll write everything down and will (hopefully) be done with my diploma work.

Published on 2007-04-28 in , , ,