During development, restarting the queue worker is tedious. Better separate worker and functionality by calling external scripts.
My search engine's crawler puts URLs that should be crawled into a Gearman task queue. A worker script runs in the background and listens for new URLs in the queue, starting the crawler whenever one is coming.
The classic worker process has two jobs: Listen at the queue and running the tasks.
During development crawler and indexer code always changes. A classic worker needs to be restarted whenever the code changes, which is tedious and annoying.
For phinde I separated queue listener and task processing:
- The worker listens to the Gearman queue. Whenever a task is available, it starts an external script that does task processing instead of doing it itself.
- The task processor is a command line script that does crawling and indexing.
Separating those two gave me a couple of benefits:
- The worker neither has to be restarted during development nor after deployment because it almost never changes.
- Task processing can crash hard without taking the whole queue processing down.
- Tasks can be started from command line, which makes development very enjoyable.
You have to decide if the payload that is passed to the workers is small enough to be transmitted as command line parameters.
Another issue is the overhead of starting new processes and their setup (database connection etc.), which might be too much if you have tiny but numerous tasks.