Entities in modular XHTML

Special characters in HTML can be written as "named entities" like ← for the "right arrow" symbol, ←. One very commonly used named entity is the non-breaking space  .

When trying to use those entities in a unprepared XML document, your XML validator will give you a big fat error:

Undefined entity  

or

general entity "nbsp" not defined and no default entity

The same will happen when you use such a named entity in a XHTML document that is served with a application/xhtml+xml MIME type.

The cause

In HTML4, those named entities were part of the specification and the HTML4 DTD. Since XHTML 1.0, only five base named entities are included in the DTD:

All other named entities have been moved to their own DTDs or entity files, see the W3C XHTML Modularization specification.

Solutions

Use numeric character references

Instead of relying on the name, you always can use the numeric character reference:

Hello  ]]>

You will find the numeric aliases in the XHTML module definitions .

Include single entity declarations

It is possible to use the "normal" XHTML DTD and extend it with the definitions of those entities that are used in the document:



]>

]]>

Include entity definition URLs

Most times you will not know which entities will be used in the document. In such cases it is easier to include the whole entity definition files. Example:



  %xhtml-symbol;
]>

]]>

The interesting fact here is that Firefox does not do that.

This brings us to the discussion if entity definitions belong to the web and if they should be used at all. Please read DTDs Don't Work on the Web.

I can follow Henri Sivonen's reasoning and came to the conclusion that named entities are a thing of the past. Today we're (or should be) using UTF-8 everywhere, and it's much easier to use the character itself instead of the named entity.

Do not use entities

Since we all are using UTF-8 as character set, all characters may be written directly in the file itself. Do that and do not worry about entities. This is the best solution.

If you encounter named entities that you do not want to replace with their character (because you cannot distinguish a " " from a " "), then use their numeric counterpart.

Written by Christian Weiske.

Comments? Please send an e-mail.