The shortcomings of HTML5
Introduction
As professional Web designers and developers, we can use and support HTML5, but at the same time we need to be able to discuss the shortcomings and limitations of the technology. In this way, we can avoid pitfalls and provide invaluable feedback to the HTML5 team, to help improve HTML.
While considering the shortcomings of HTML5 in this article, you will probably wonder why a specific feature was implemented in a given way. To answer this, it's important to understand that features that go into HTML are rarely designed and evaluated purely on merit. More often than not, features that make it into the spec are negotiated by different implementers (mostly browser vendors) in a kind of standards bazaar.
But it is ultimately Web developers and designers like you and I who will work with the technology on a daily basis, and who will have the final say about the use and longevity of any HTML feature.
HTML5 does not fix HTML
Perhaps the biggest misconception about HTML5 is that it fixes the problems with HTML. The most serious problem is that HTML/JavaScript applications are inherently insecure and are vulnerable to mischief, attacks and data theft. One reason for this stems from the fact that HTML permits data to be intermixed with executable code (JavaScript). This makes HTML/JavaScript applications susceptible to cross-site scripting attacks. HTML5 adds new and more powerful DOM APIs that enable JavaScript code greater access to network communication and cached data, but giving more power to executable code without fixing security design flaws is disturbing.
Making simple concepts complex
The HTML5 team has a way of making simple concepts complex. Take for example the idea of alternate text used in the img element. Alternate text stands in place of images when images cannot be seen. It's a simple concept, right? Yet, in addition to the 15 pages that describe the img element in the specification (meant for technical users) there is an accompanying 40 page document that describes how to author alternate text (meant for non-technical users). Are content authors going to read 40 pages on alternate text? Making HTML unnecessarily complex can deter users from using features correctly.
Headings have also been made more complex. Headings (h1 to h6) in HTML were ill conceived, but at least it was easy enough to train someone to use them correctly. Headings are important because they organize and group content, as well as providing navigation for users of assistive technologies. The following screen shot shows how IBM's aiBrowser enables navigation using headings, because the heading levels reflect the physical role of the headings in the structure of the document. For example, "Header Level 1" is represented by the h1 element, "Header Level 2" by the h2 element, and so on.

Now let's look at how HTML5 redefines h1 to h6 elements. HTML5 bases the ranking of the heading depending on where in the document they are used. In the following example, in the document outline, the first h3 element represents the highest ranking heading, the second h3 element represents a lower ranking heading (yet has the same rank as the h1 element), and the third h3 element has no ranking at all. Try to explain that to a non-technical user!
<body><h3>Movies</h3><section><h1>Romance</h1></section><section><h3>Action</h3></section><section><h1>Science Fiction</h1><hgroup><h2>Star Trek</h2><h3>The Wrath Of Kahn</h3></hgroup></section></body>
The time element introduced in HTML5 is another example of an element that changes its meaning depending on where it is used. First, if it is used without the pubdate attribute, it provides a machine-readable date. For example:
<time datetime="2011-11-24">November 24, 2010</time>
Alternatively, if used with the pubdate attribute that is not inside an article element, it indicates the publication date of the entire document. For example:
<body>...<time datetime="2010-11-24" pubdate>November 24, 2010</time>...</body>
Or, if used with the pubdate attribute that is inside an article element, it indicates the publication date only of the article. For example:
<article>...<time datetime="2010-11-24" pubdate>November 24, 2010</time>...</article>
In addition, the pubdate attribute is always optional, whereas the datetime attribute is sometimes optional and at other times it is required:
The datetime attribute is required when the pubdate attribute is used and when the element does not contain a string in the valid date time format. For example:
<time datetime="2010-11-24" pubdate>Wednesday</time>
But the datetime attribute is optional when the time element does contain a valid date time format. For example:
<time>2009-11-16</time><time pubdate>2009-11-16</time>
The time element can also be empty. For example:
<time datetime="2009-11-16"></time><time datetime="2009-11-16" pubdate></time>
A further rule states that no more than one time element can be permitted directly within an article element, or no more than one time element outside the article element.
Incompatibility issues
HTML5 is also supposed to be compatible with existing browsers and tools. However, in reality this is not the case. There is nothing wrong with breaking compatibility if it is intentional and all users are aware and prepared for the technology shift. However, the problem is that much of the breakage in compatibility is unintentional and now the error has been made, it's hushed up.
Let's look at some incompatibility issues:
HTMLTidy is a tool that is used to fix HTML. Some use HTML Tidy directly before publishing a document. Some use HTML Tidy indirectly and don't even know they are using it, because the CMS or a WYWISYG editor hides its use. Let's take the following valid HTML5 document as an example:
<!DOCTYPE html><title>Greetings</title><a href="http://localhost"><div>Hello World!</div></a>
HTML Tidy will distort the content by changing the DOCTYPE and creating a useless hyperlink:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta name="generator" content="HTML Tidy for Windows (vers 25 March 2009), see www.w3.org"><title>Greetings</title></head><body><a href="http://localhost"></a><div>Hello World!</div></body></html>
The following document uses a new HTML5 element:
<!DOCTYPE html><title>Greetings</title><header>Hello World!</header>
When HTML Tidy encounters an unknown element, it will throw the following error and will not be able to proceed:
- line 3 column 1 - Error: <header> is not recognized!
- line 3 column 1 - Warning: discarding unexpected <header>
- line 3 column 9 - Warning: plain text isn't allowed in <head> elements
- line 2 column 1 - Info: <head> previously mentioned
- line 3 column 9 - Warning: inserting implicit <body>
- line 3 column 21 - Warning: discarding unexpected </header>
- Info: Document content looks like HTML 3.2
- 4 warnings, 1 error were found!
- This document has errors that must be fixed before using HTML Tidy to generate a tidied up version.
Other authoring tools have incompatibility issue with HTML5 as well. Let's start with the following markup:
<a id="abc"><div>Hello World!</div></a>
TinyMCE running in IE will generate the following invalid HTML (all versions):
<p><a id="abc"><div>Hello World!</div></a></p>
XStandard will remove the div and enclose the a element in a paragraph:
<p><a id="abc">Hello World!</a></p>
If you start with the following markup:
<p>XML<wbr />Document</p>
CKEditor will create this:
<p>XML<wbr>Document</wbr></p>
Many new HTML5 features cannot be implemented
It is impossible or impractical to build authoring tools (such as WYSIWYG editors) to support some of the new HTML5 features. Let's take headings (h1 to h6) as an example. As discussed, in HTML5 these elements change their ranking and/or semantic meaning depending on where in the document they are used. But authoring tools have a fixed label for each heading like this:

It would be confusing to users if authoring tools displayed "Heading 2" for an h3 element simply because it is used in a part of the document that gives it a ranking of 2.
Another feature that is impractical to implement is the new hyperlink behavior that permits hyperlinks to contain block level elements such as paragraphs. If a user selects all the content in a paragraph and presses on the hyperlink button in an authoring tool, how is the authoring tool to know if the user wants this:
<p><a href="page.htm">Some text.</a></p>
or this:
<a href="page.htm"><p>Some text.</p></a>
Making a user interface that will support an optional alt attribute for the img element is also impractical. The img element without an alt attribute is permitted if one of the following conditions is met:
- The
title attribute is present and has a non-empty value. - The
img element is in a figure element that contains a figcaption element that contains content other than inter-element whitespace, and, ignoring the figcaption element and its descendants, the figure element has no text node descendants other than inter-element whitespace, and no embedded content descendant other than the img element.
To support the 3 states of alt attribute supported by HTML5, an authoring interface might need to look something like this:

Lack of versioning
What does the 5 in HTML5 stand for? You might be surprised to know that it does not stand for version 5. I guess the best way to describe it is that 5 represents the fifth major effort of work on HTML. But the problem is not so much with how we label HTML. Instead, the problem is that there is no way for Web page creators to mark Web pages in a way that says that their document conforms to HTML5. There is a new DOCTYPE that is associated with HTML5, but it has no version number, and it will be the same DOCTYPE when HTML6 comes out. Lack of versioning is not a problem for Web browsers because they treat everything like it's Tag Soup. But Web page creators that use validators and markup correction tools like HTML Tidy may miss versioning because these tools will not be able to distinguish between HTML5 and HTML6 when processing documents.
Lack of extensibility
HTML5 defines new semantic elements such as header, footer, nav, aside, article and section. More semantics are good, but the semantics offered by these elements are not enough. Also, these elements function in the same way as the generic div element. If an element is a derivative of a generic element such as div, then semantics should be added via an attribute like this:
<div type="article">...</div>
And semantics could be grouped into classifications. For example:
<div type="article.news.international">...</div>
These semantics could also be defined in a separate, smaller specification that could be updated more frequently than the HTML specification. You could also have online services that make these classifications available to authoring tools (and other apps) with friendly, localized labels which will empower non-technical content authors to apply semantic markup with ease, as shown in this mock screen shot:

Missing features
Making the Web into a better application platform is a good idea, but the Web is still and for many years to come will be about content. So where are the features to make content easier to publish and read? I am talking about features that you could use in desktop publishing applications 20 years ago but still cannot use on the Web today. How about a feature that will generate a dynamic table of contents from headings like this:
<toc></toc><h1>Movies</h1><h2>Action</h2>...<h2>Science Fiction</h2>...<h2>Romance</h2>
Future browsers can then display the outline automatically hyperlinked like this:

And with line numbers when printed like this:

What about other features such as newspaper-like columns, or markup to support change tracking?
Stalled progress on Web accessibility
Perhaps the most disappointing shortcoming of HTML5 is the lack of any new features to make HTML more accessible, or to make existing accessibility features easier to use. HTML5 does nothing significant to make Web technology more accessible. In fact, some accessibility features have either been removed or weakened.
One of the removed features is the longdesc attribute for the img element. This feature is used to provide a description of the image. In the past, because this feature has been poorly defined in the HTML spec, it has been misunderstood and used incorrectly. However, longdesc does have the potential to make images accessible, were it to be defined correctly, and there is strong support in the accessibility community to make this feature work should the HTML5 team add this feature back into the HTML5 spec.
Another issue is that the alt attribute for the img element has been made optional under certain conditions. However, in the minds of many people, if a feature is optional in one situation, it is optional in all situations.
There are also features such as the canvas element, used as a drawing surface, that have been added to HTML5 without any consideration for accessibility. Although efforts are being made to make this feature somewhat accessible, it is unclear at this stage if they will be successful.
Headings are also important for users of assistive technologies, primarily to navigate content. Yet numbered headings (h1 to h6) are a fundamentally flawed construct. Headings have to be redesigned so that authoring them is foolproof. Since headings don't make sense on their own, they should be part of a section element to form a compound construct, similar to table, ol, ul and dl. For example:
<section><heading>...</heading><content>...</content></section>
And heading sections could be nested like this:
<section><heading>...</heading><content><section><heading>...</heading><content>...</content></section><section><heading>...</heading><content>...</content></section></content></section>
Navigating between pages on a site is also a big challenge for users of assistive technologies. The nav element in HTML5 is insufficient to help navigation, because this element is essentially a div that can contain anything. What is required is a more structured construct similar to an ordered list. This construct could be used to build navigation menus and breadcrumbs.
Conclusion
Discussing the shortcomings of a technology is a way to improve it, and provides invaluable information to the HTML5 team:
- HTML5 does not fix existing problems with HTML and it may also leave Web applications vulnerable to security breaches.
- HTML5 makes simple concepts unnecessarily complex, which can discourage users from using features correctly.
- There are numerous unanticipated incompatibility issues with HTML5.
- Many HTML5 features cannot be implemented in authoring tools.
- Users of validators and markup correction tools will lose functionality because HTML5 does not have a version identifier.
- Semantics defined by the new HTML5 elements are not sufficient. HTML needs to have extensibility built in.
- HTML still lacks features found in desktop publishing.
- Perhaps the most grievous shortcoming of HTML5 is that it does nothing significant to make HTML more accessible.
Comments are closed for this article.