HTML Dog
Skip to navigation

Declarations

This page deals with the how to define a valid XHTML document.

Document type declarations

At the very top of your web pages, you need a document declaration. That’s right, you need it.

Without specifying a doctype, your HTML just isn’t valid HTML and most browsers viewing them will switch to ‘quirks mode’, which means they will think that you don’t know what the hell you’re doing and make up their own mind on what to do with your code. You can be the greatest HTML ninja ever to have walked the earth. Your HTML can be flawless and your CSS simply perfect, but without a document declaration, or a wrong document declaration, your web pages can look like they were put together by a short-sighted, one-eyed infant gibbon with learning difficulties.

The document declaration for XHTML 1.0 Strict looks like this:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

The following is the document declaration for XHTML 1.1, which may seem preferable, being the latest version of XHTML, but there are a few problems, which will be explained in just a minute…


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

If you just can’t let go of HTML 4 or if you’ve got some kind of Netscape 4 fetish, you can use XHTML 1.0 Transitional:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The only reason you should use this is if you have an unusual need to accommodate older, rarely used browsers. Transitional XHTML 1.0 allows old HTML 4 presentational elements that may result in better presentation in browsers such as Netscape 4 but using such elements will be detrimental to the efficiency and possibly accessibility of your web pages.

Finally, if you’re one of those wacky people who use frames, the XHTML 1.0 Frameset document type declaration looks like this:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

Note that the DOCTYPE tag is a bit of a rebel and demands to be written in upper case and adorned with an exclamation mark. It also breaks the rules in that it is the only tag that doesn’t need closing.

Language declarations

You should identify the primary language of a document either through an HTTP header or with the xml:lang attribute inside the opening html tag. Although this is not necessary to produce a valid XHMTL document, it is an accessibility consideration. The value is an abbreviation, such as ‘en’ (English), ‘fr’ (French), ‘de’ (German) or ‘mg’ (Malagasy).

The declaration for a document with primarily English content for example would look like this:


<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

After declaring a primary language, if you use languages other than that in your content, you should further use the xml:lang attribute in-line (such as <span xml:lang="de">HTML Hund</span>).

Content types

The media type and character set of an HTML document also needs to be specified, and this is done with an HTTP header such as:


Content-Type: text/html; charset=UTF-8

The first part (in this example, the text/html bit) is the MIME type of the file, and this lets the browser know what media type a file is and therefore what to do with it. All files have some kind of MIME type. A JPEG image is image/jpeg, a CSS file is text/css and the type generally used for HTML is text/html.

The second part of the HTTP header (in this example, the UTF-8 bit) is the character set.

Perhaps the easiest way to set an HTTP header (or mimic it) is to use an ‘HTTP-equivalent’ meta tag in the HTML, which would look something like this:


<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Slightly more complicated, but preferable (due to it being a proper HTTP header AND cutting down on HTML), is to send the header by using a server-side scripting language. With PHP, you might use something like this:


<? header("Content-Type: text/html; charset= UTF-8"); ?>

If you don’t want to (or can’t) use a server-side scripting language, you might be able to go straight to the server with an ‘.htaccess’ file. Most servers (Apache compatible) can have a small text file with the file name ‘.htaccess’ that sits in the root directory and with the following line in it, you can associate all files with the extension ‘.html’ with a MIME type and character set:


AddType text/html;charset=UTF-8 html

Character sets include ‘ISO-8859-1’ for many Western, Latin based languages, ‘SHIFT_JIS’ for Japanese and ‘UTF-8’, a version of Unicode Transformation Format, which provides a wide range of unique characters used in most languages. Basically, you should use a character set that you know will be recognised by your audience. Unless you are using a Latin-based language (including English), where ISO-8859-1 can be used and is mostly universally understood, you should use UTF-8 because it can display most characters from most languages and is the safest code to use because it will work on most people’s computers.

You can read more about character sets elsewhere on the web.

XHTML should be served by the MIME type application/xhtml+xml. That’s what it is - an XML application. Unfortunately, most browsers don’t have the first clue what this is. So it is generally accepted that it’s ok to use the MIME type text/html. According to the W3C, and further highlighted by the Web Standards Project, flavours of XHTML 1.0 may be served as text/html, but XHTML 1.1 should not, which is why the examples across this site are XHTML 1.0 Strict, assuming a text/html MIME type. But you can (and perhaps should) serve the correct MIME type to those browsers that accept it with a bit of server-side fiddling.

This site uses PHP to serve XHTML 1.1 with an application/xhtml+xml MIME type to those browsers that understand and render the type (such as Mozilla) and XHTML1.0 Strict with the text/html type to other browsers (such as IE). The script, placed at the top of the very top of every page looks a little something like this:


<?
if(stristr(\$_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")){
    header("Content-Type: application/xhtml+xml; charset=UTF-8");
    echo('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">');
} else {
    header("Content-Type: text/html; charset=UTF-8");
    echo ('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">');
}
?>

This checks to see if the browser accepts the application/xhtml+xml MIME type and if it does, that MIME type is sent and the XHTML1.1 document type is written to the HTML. If the MIME type isn’t recognised then the text/html MIME type is sent and the XHTML1.0 Strict document type is written in the HTML.

Other than peace of mind that you know you’re doing the right thing and preparing yourself for the way to do things in the future, the immediate benefit of using this method is that Mozilla will treat your files as XML applications and simply won’t work if your XHTML isn’t up to scratch ie, isn’t well formed. You can then debug without having to run the document through a validator.