XHTML breakages

The HTML pages on my blog are served with the MIME content type application/xhtml+xml. This forces browsers to use an XML parser instead of a lenient HTML parser, and they will bail out with an error message if the XML is not well-formed.

Yesterday I was someone complained by e-mail that he could not read my blog because Firefox showed an XML parsing error. In addition to that, the archive.org version of my blog also only showed an XML parsing error.

Internet Archive

The internet archive version is broken because their software injects additional navigation header into the content, which is not well-formed at all:

Example: Goodbye, CAcert.org @2017-06-06.

Chromium 60 displaying an error for my blog's page on archive.org Firefox 55 displaying an error for my blog's page on archive.org

I opened a bug report for issue: internetarchive/wayback: #156 xhtml pages broken

Firefox

But my contact person also complained that his browser brought an XML parsing error:

XML-Verarbeitungsfehler: nicht wohlgeformt
Adresse: http://cweiske.de/tagebuch/
Zeile Nr. 42, Spalte 328:
function cleanCSS2277284469133491(d) { if (typeof d != 'string') return d; var fc = fontCache2277284469133491; var p = /font(\-family)?([\s]*:[\s]*)(((["'][\w\d\s\.\,\-@]*["'])|([\w\d\s\.\,\-@]))+)/gi; function r(m, pa, p0, p1, o, s) { var p1o = p1; p1 = p1.replace(/(^\s+)|(\s+$)/gi, '').replace(/\s+/gi, ' '); if (p1.length < 2) { p1o = ''; } else if (fc.indexOf(p1) == -1) { if (fc.length < fontCacheMax2277284469133491) { fc.push(p1); } else { p1o = fc[0]; } } return 'font' + pa + p0 + p1o; } fontCache2277284469133491 = fc; return d.replace(p, r); } 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^

It turned out that he had the firegloves extension installed, which injects not well-formed HTML tags as well: #2: Breaks XHTML pages delivered as application/xhtml+xml.

Why XHTML?

My blog is static hand-written HTML, and I have a couple of scripts that help me writing articles: Image gallery creator, TOC creator, ID attribute adder and so on. Using an XML parser for those tools is so much easier than a HTML5-compliant parser.

Moving from my old lightbox gallery script to Photoswipe was only possible because I could automatically transform the XHTML code with XML command line tools.

Written by Christian Weiske.

Comments? Please send an e-mail.