Unsupported: 406 Not Acceptable

While implementing the crawler for my own search engine phinde, I tried to minimize the amount of data transferred between web servers and the crawler.

The crawler can only extract links from HTML, XHTML and Atom feeds, so it sends a HTTP Accept header stating that:

Accept: application/atom+xml, application/xhtml+xml, text/html

Unfortunately, my Apache still sends out the content for large .bz2 files that my crawler then has to throw away.

Specification

The HTTP/1.1 RFC 2616 states in section 10.4.7:

Note: HTTP/1.1 servers are allowed to return responses which are not acceptable according to the accept headers sent in the request.

I think this was noted to make it easier to implement HTTP/1.1.

Server support

Unfortunately, none of the 3 big web servers makes it possible to send out a 406 status code when the Accept condition cannot be fulfilled. I've opened a feature request for Apache: Option to send "406 Not Acceptable" when mime type in "Accept" header cannot be fulfilled

Standard configuration doesn't support it by no means:

Apache

$ curl -IH 'Accept: image/png' http://httpd.apache.org/
HTTP/1.1 200 OK
[...]
Server: Apache/2.4.7 (Ubuntu)
[...]
Content-Type: text/html

Lighttpd

$ curl -IH 'Accept: image/png' http://www.lighttpd.net/
HTTP/1.1 200 OK
[...]
Content-Type: text/html
[...]
Server: lighttpd/2.0.0

nginx

$ curl -IH 'Accept: image/png' http://nginx.org/
HTTP/1.1 200 OK
Server: nginx/1.9.8
Date: Wed, 10 Feb 2016 20:11:30 GMT
Content-Type: text/html; charset=utf-8

Written by Christian Weiske.

Comments? Please send an e-mail.