The crawler can only extract links from HTML, XHTML and Atom feeds, so it sends a HTTP Accept header stating that:
Accept: application/atom+xml, application/xhtml+xml, text/html
Unfortunately, my Apache still sends out the content for large .bz2 files that my crawler then has to throw away.
Note: HTTP/1.1 servers are allowed to return responses which are not acceptable according to the accept headers sent in the request.
I think this was noted to make it easier to implement HTTP/1.1.
Unfortunately, none of the 3 big web servers makes it possible to send out a 406 status code when the Accept condition cannot be fulfilled. I've opened a feature request for Apache: Option to send "406 Not Acceptable" when mime type in "Accept" header cannot be fulfilled
Standard configuration doesn't support it by no means:
$ curl -IH 'Accept: image/png' http://httpd.apache.org/ HTTP/1.1 200 OK [...] Server: Apache/2.4.7 (Ubuntu) [...] Content-Type: text/html
$ curl -IH 'Accept: image/png' http://www.lighttpd.net/ HTTP/1.1 200 OK [...] Content-Type: text/html [...] Server: lighttpd/2.0.0
$ curl -IH 'Accept: image/png' http://nginx.org/ HTTP/1.1 200 OK Server: nginx/1.9.8 Date: Wed, 10 Feb 2016 20:11:30 GMT Content-Type: text/html; charset=utf-8