Last week I released version 0.2.0 of phancap, the self-hosted website screenshot service.
phancap uses cutycapt to render website screenshots on a headless server, and it works fine most of the time. The problem here is most of the time; cutycapt has a nasty bug that makes it hang on some web sites and use 100% CPU, forever. This is not something that you want to run unattended on your server.
Process control
Fortunately, PHP has the pcntl extension that allows PHP scripts to fork and control processes. My idea was to let cutycapt run for at most 30 seconds and kill it if it did not exit in that time. This way, screenshot generation would fail for some URLs, but the server would not collect hundreds of never-ending processes.
The implementation wasn't that hard and worked fine on command line: It forked a child process that hung forever and killed it after 3 seconds.
After integrating it in phancap I noticed that it did not work. I was sure I had the pcntl extension installed and enabled, and it worked fine on CLI - just not when run under mod_php.
A user note in the PHP manual finally gave a hint why it failed:
"Process Control should not be enabled within a web server environment and unexpected results may happen if any Process Control functions are used within a web server environment."
At least for PHP 5.3.8 which I am using, and who knows how far back, it's not a matter of "should not", it's "can not". Even though I have compiled in PCNTL with --enable-pcntl, it turns out that it only compiles in to the CLI version of PHP, not the Apache module. As a result, I spent many hours trying to track down why function_exists('pcntl_fork') was returning false even though it compiled correctly. It turns out it returns true just fine from the CLI, and only returns false for HTTP requests. The same is true of ALL of the pcntl_*() functions.
To summarize: PHP's pcntl extension does not work on Apache with mod_php.
timeout to the rescue
After some searching I found the timeout command on my linux system, which came with the GNU core utilities.
timeout lets you specify how long to wait for a process to finish, and kills it with a user-defined signal if it's still running.
phancap utilizes timeout now when running cutycapt. With that I'm confident enough to let phancap run on my public server and use it on the SemanticScuttle demo site to provide screenshots for bookmarked web sites.