For the demo installation of bdrem (birthday reminder tool) I needed a list of birth dates, preferably public ones.
Finding a source
The largest free source of person data is Wikipedia, so I looked there for a list of persons. Beside lists, Wikipedia also has a List of lists. Its people category was what I was looking for; by drilling down three times I found the list of notable German scientists.
The meta data of nearly every of the linked scientists have a "Born" field in the right meta data field. Now the question was how to extract those data with out manually parsing all the HTML or mediawiki markup.
DBpedia
I remembered DBpedia from the time I wrote my diploma thesis; it is a database containing all the meta data from Wikipedia; updated in near realtime.
DBpedia is a SPARQL database (triple store) and has a public SPARQL query page.
But what should I use as query? What are the fields I can use?
Finding properties
The datasets page linked from the main page links to some resource pages, e.g. dbpedia.org/resource/Berlin.
I simply replaced Berlin in the URL with one of the scientists Alexander_von_Humboldt and had the resource page. There I saw the properties that I was interested in:
dcterms:subject category:German_scientists dbpprop:dateOfBirth 1769-09-14 dbpprop:name Alexander von Humboldt
SPARQL
SPARQL is a bit like SQL (SELECT, WHERE, LIMIT), the actual conditions are sentences: subject predicate object. Knowing this and the properties above gave me the following query:
SELECT DISTINCT ?Name, $BirthDate WHERE { ?Scientist dcterms:subject category:German_scientists. ?Scientist dbpprop:birthDate ?BirthDate. ?Scientist dbpprop:name ?Name. } LIMIT 100
An voila - I had a list of scientists, their name and birth date. The DBpedia SPARQL page also offers CSV export, and using it gave me:
"Name","BirthDate" "Burkhard Rost",1961-07-11 "Rost, Burkhard",1961-07-11 "Victor Gustav Bloede",1849-03-14 "Bloede, Victor G",1849-03-14 "Wilhelm Körner",1839-04-20 ...
bdrem
bdrem version 0.6 got support for CSV files. The following configuration is needed to make it display the scientist's birth dates:
$source = array( 'Csv', array( 'filename' => 'german-scientists.csv', 'columns' => array( 'name' => 0, 'event' => false, 'date' => 1 ), 'defaultEvent' => 'Birthday', ) );
bdrem now renders following output on the shell:
$ ./bin/bdrem.php -------------------------------------------------------- Days Age Name Event Date Day -------------------------------------------------------- -2 103 Braun, Wernher von Birthday 23.03.1912 Mo 10 144 Arthur Wehnelt Birthday 04.04.1871 Sa 10 144 Wehnelt, Arthur Birthday 04.04.1871 Sa 13 76 Bernd Brinkmann Birthday 07.04.1939 Di 13 76 Brinkmann, Bernd Birthday 07.04.1939 Di