OpenDocument 0.2.0 released

Part of my PEAR voluntary duties is writing documentation for undocumented packages, since that's one of the weak points in PEAR. It is time-consuming, but thanks to PhD testing is faster than ever. The big plus - what makes writing docs really worthwile - is that one learns about the package. You get to understand it, find the deeply hidden features and tell the world about it in one, two or five tutorials.

Beginning of June I started with our ~~under~~not documented OpenDocument package. The first and hardest part is to understand how the library itself works and what you can do with it. It had three little examples, and that basically was it. Oh yes, and a dozen of nearly undocumented source code files!

The examples created and read out text documents, but I saw not the slightest hint that one could create spreadsheet files - so that OpenDocument could be used as alternative to Spreadsheet_Excel_Writer, the bug-hogged and unmaintained beast. Digging deep into the sources revealed that there is no such thing like "multiple document types" - and that although the OASIS OpenDocument specification defines a bunch of them: Text, Spreadsheet, Presentation, Drawing, Chart, and Image.

Writing a tutorial about generating text documents also revealed something I had feared: Apart from adding paragraphs, headlines, links and span elements, one can't do anything. A bunch of styles are supported, but one can't even align a paragraph!

With the package in a messy state, something had to be done about it - I took it over and maintain it since. My first step was to get really deep familiar with the code - an undertake that took me two weeks. Especially hard was to understand the styling concept - two days because a docblock suggested that the style variable was an array, while it is an object in reality. The second task was to get into the OASIS OpenDocument specification, the 1300 pages beast. I spent the better part of three weeks reading it when riding the train to and from work, as well some of the sparetime at home.

So after announcing that I'd lead the package now, nothing really happened until about 21st of june, when the first open bug was patched and the unit test for it got added.

New functionality for a new release

Apart from fixing some open bugs, I had two new features planned for my first OpenDocument release:

A new storage engine capable of loading from and saving to different file formats - the specs actually define two: The well-known zip files containing multiple XML files plus all the embedded pictures, and all that combined in a single XML file. The latter one will come handy when debugging and testing, since one doesn't always have to unzip the document before inspection.
Restructurization of the API to allow support for multiple document types (spreadsheets!), even if I won't be implementing any new type yet.

After about three weeks of nearly daily work on it, the planned changes were made. The Single XML file storage is not completely done yet, but that wasn't a priority for the release.

Now that we were back from honeymoon and I had to go to work again, the train rides gave me opportunity to actually pack up the release with the longest release notes I ever wrote.

OpenDocument 0.2.0 was released in PEAR on 2009-07-28, two years after the last release. It broke backward compatibility, and the next releases will break more of it - but in the end, it will be worth all the hassle.

Planned changes

Apart from the usual bug fixes, implementation of new features (like lists, images and new styles), and new document types (spreadsheets!), there are three areas that need attention:

The way new elements are added is suboptimal and requires too much duplicated code
The API currently has two object layers for the document: DOM for XML storage, and a layer of own objects on top of that. I'd really like to see those two merged.
Style handling is currently missing some basic features and feels a bit .. jerky.

Other implementations

Apart from hacking on the package, I also looked for other PHP OpenDocument implementations. There are some, but many of them are dead or do only implement a small part of the specs or one document type.

ods-php to create spreadsheets (seems dead)
OdtPHP to create text documents (also dead)
OpenDocument PHP supports spreadsheets and text documents, but is also dead.
Dio seems not so dead and has support for text, spreadsheets and charts!
PHP DocWriter generates .sxw (OpenOffice 1.x) files only.
TinyButStrong OOo is a templating system generating OpenDocument text files.
PHP OpenOffice.Org Template uses OpenDocument templates files to generate text documents.

Some PHP applications have OpenDocument spreadsheet generation support for the export of files:

Tine 2.0 OpenDocument code
phpMyAdmin also has export code (ods.php and odt.php)

Those export implementations are designed for speed and mostly spit out hand-crafted XML. PEAR's OpenDocument package strives to read and write files, and to be able to change documents on the fly. So we'll never be as fast as those export implementations. Although I already have ideas of alternative XMLWriter backed document classes that would not have the ability to read documents, but write them out really quick - and would not hog memory.

Interesting times ahead - and I'd be glad if someone interested in the project could help; I'm only doing this in my sparetime, totally unrelated to work.

Tags

OpenDocument 0.2.0 released

New functionality for a new release

Planned changes

Other implementations