Generating CHM files with Wine on a Linux server, and a bit about PEAR's PhD transition

Last week we released PhD version 0.4.5, only three days after pushing out 0.4.4 because of a serious bug (due to my misconception) in PhD's MediaManager responsible for image and general external file handling in XHTML themes. Not to forget pluggable highlighters - you can use GesHi instead of PHP's internal syntax highlighter if you want that or need more options to customize your programlisting display.

Now that we're using PhD to render the PEAR manual since three weeks on our live server, people outside the developers interested in peardoc and PhD use it and find bugs. Laurent ported his up to now external TDGs of his PEAR packages into the PEAR documentation - which was quite easy, because every single package owns a full <book> tag now.

With Laurent's package docs came another feature request that had not been possible in peardoc up to now: Screenshot and general image support. That has been implemented in a clean way for XHTML themes at least; PDF and the man page rendering theme still wait for someone taking care of that for them.

A little bit of history

The first mention of PhD on the peardoc mailing list was a mail of Greg Beaver end of november 2007, nearly one and a half year ago, following a discussion on IRC.

The tasks were laid out quickly:

  1. Convert the XML from DocBook 4 to DocBook 5, a huge transition
  2. Fix the build magic in configure
  3. Get PhD to render pear themes
  4. Install and run everything on the server

After the initial mails, months of silence followed with some sporadic mails by Hannes or Rudy about PhD enhancements.

peardoc conversion to DocBook 5

Finally, on 2008-10-03, the real transition began. We worked in our own branch at first, trying to prevent breaking the current manual build process. Brett wrote a conversion helper script that escaped CDATA sections - we would have had lost them using the official DocBook4-to-DocBook5 conversion script, which is pure XSLT. The official conversion script was not fully working for us, so I had to make some fixes and extensions to it until the resulting files finally passed the DTD test with xmllint. The PEAR wiki contains information about our conversion process and TODOs for you to read - if you are interested.

Another issue during conversion were charsets - up to now, nearly every translation had its own charset. The english one used ISO-8859-1, the German one ISO-8859-15, the french and japanese ones others. All those files got converted to UTF-8 which broke here and there - some single files already were UTF-8 - but the conversion simplified the whole file handling.

Since we still use entity includes, a configuration script generating the chapters.ent with all entity-to-file translations was needed. That one was quickly hacked, and then we were able to finally validate the manual at a whole. This led to weird crashes in all places: Mixed charsets in files partially caused glibc to crash, the polish translation (largely outdated) even caused a problem that could be used to DOS attack a server using libxml2 by putting the parser into an endless loop. After too many hours of debugging I had reduced the problem to a tiny little script; libxml2 has been patched since then and isn't vulnerable to this problem anymore.

A week later - 2008-10-09 - I could announce that peardoc's CVS HEAD had been filled with the new files, and the old ones had been dropped. The manual would not build anymore on since it was fully configured to the old way to do things, and it would take nearly three more months until we would get regular builds again.

PhD improvements

Now that peardoc was in good shape to be built by PhD, I had to make sure that PhD was up to the task. Exactly eleven days later on October 20th, Hannes and I released PhD 0.4.0 with brand new PEAR XHTML themes and a big bunch of bugfixes.

The next weeks were spent fixing thousands of xml validation errors that were in the generated files due to the blindness PhD has when it comes to element nesting - it just does not know what's before it, and it knows only a small part of what it already did with the xml file. Using that principle makes generating Xhtml, PDF and man pages lightning fast, and I had to make sure I would not sacrifice the speed for correctness. Apart from validation issues, we had some serious navigational problems generating Prev/Next links. PhD 0.4.1 was released on November 8th 2008.

Now other people began to use PhD for peardoc (Laurent was among the first on the frontier!), and soon bugs we never noticed before began to dribble in. I always wanted to get Laurent's Definitive Guides integrated into peardoc, and so I touched all manual xml files again to finally give every package an own <book> tag! This means maximum freedom for documentation authors, and Laurent was able to do the TDG integration into peardoc in only a few days.

In December, version 0.4.2 followed. January brought us 0.4.3 and February a broken 0.4.4 and a fixed 0.4.5. The switch to our new server was made end of January, and that was also the time we got the first fresh manual builds! In the meantime, I was even able to get CHM compilation working directly on our Linux box. This is what this post was to be about originally and will be described in the next chapter.

Apart from the improvements for peardoc, PhD got an own manual itself and begins to be usable to render docbook files that are not phpdoc nor peardoc. The PHP-Gtk project is also beginning to transform their docs into DocBook 5 and will use PhD then, too. Great future ahead for PhD!

Installing Microsoft's HTML Help Compiler on a Linux server

The only way to generate .chm files is to use Microsoft's HTML Help Compiler hhc.exe from the HTML Help Workshop. It can be downloaded free of charge from their website. While the workshop tool is a GUI application, hhc.exe can run completely without any windowing environment on a server - ideal for an headless web server like I already got it running on my Linux desktop at home using Wine, so I chose to go the same way on the server.

The downside is that the installation procedure displays two windows: One dialog to accept the license, and the "copying files" dialog. So the first thing to do is ssh'ing to the server using -X or -Y as parameter to enable X forwarding. On a stock web server, this will silently fail and your $DISPLAY variable will be empty. When using ssh -v -Y, you will see that there is a complaint about some X authentication failing. This is because xauth is missing - installing it will fix that problem, and you will be able to start xclock or xterm on your remote machine.

Now that this - for me - biggest hurdle was taken, the rest was a piece of cake. Install HTML Help Workshop on the machine and then, due to bug #7517, you need to install two native dll files: itircl.dll and itss.dll. I had to use winecfg to register the dlls, using regsvr32 as described on the HTML Help Workshop 4 Wine page did not work for me.

Securing the machine

I don't trust Microsoft, so I can't trust their software - and hhc.exe is no exclusion. I created an own user (chmdude) and tried to run the chm compilation process only as this user. It did not work at first using sudo - it worked on one shell, but not on the other. Two hours later - thanks Stefan! - I found the reason; the shell hhc.exe did not work in still had $DISPLAY set, but chmdude had no rights to access xauth... With a fresh shell, sudo -i it works now without problems. I had a fun time learning about all those small sudo parameters, but in the end, compilation works. Wine still complains about missing X, but the compilation process runs nontheless.

sudo -u chmdude -i -- -c \
 "cd \"$p\" &&\
 /usr/bin/wine '/home/chmdude/.wine/drive_c/Program Files/HTML Help Workshop/hhc.exe'\

Written by Christian Weiske.

Comments? Please send an e-mail.