XTech 2005: XML, the Web and beyond.

WHATWG - Proposing extensions to HTML4 and the DOM

Discuss this paper on the XTech wiki
View XML source for this paper

Keywords

Abstract

The WHAT working group http://www.whatwg.org/, a loose collaboration of Web browser vendors and many interested individuals, has been publicly working on proposals for a hypothetical "HTML5". This session will describe the state of these proposals.

Introduction

HTML has been a massive success. Google has indexed over eight billion documents — even if we assume only half of those are HTML, that's still four billion documents world wide. Hundreds of billions of lines of markup.

Pages are beginning to have more and more dynamic user interfaces, using scripting and the DOM to do more than just have fancy effects. Famous recent examples of this are GMail and Google Maps, but those sites are merely the tip of the iceberg.

But all is not well. HTML authors are running into problems.

Problems faced by Web authors

The foremost problem is browser incompatibilities. These are mainly caused by three things:

The second problem faced by authors is a lack of features to achieve the effects they desire. Authors are forced to deploy large scripts to achieve effects that should be simple.

This is where the WHAT working group comes in.

What is the WHATWG

The WHATWG is an open mailing list, organised by Opera, Mozilla, and Apple, but open to all and primarily driven by the contributions of independent individuals, not by the vendors, which is developing proposals for extensions to HTML4 and the DOM to make life easier for Web authors.

The first proposal created by the WHATWG was a set of extensions to HTML's forms features, with associated DOM APIs, to address authors' needs in the forms space. WF2

The second proposal being created by the WHATWG is a wholesale update of HTML4 and its associated DOM APIs. This is a much bigger project but work is already well underway. WA1

Both proposals also address the "poor specifications" issue mentioned above, making sure that edge cases and error handling behaviour are well defined.

A draft of the first proposal was recently submitted to and acknowledged by the W3C. The WHAT working group members want to work closer with the W3C, since the W3C is where Web standards should be designed. Discussions are ongoing about how this work could be moved to the W3C while keeping its very open nature.

Design principles

Backwards compatibility

The most important aspect of the WHATWG work is that backwards compatibility must be maintained in any specification that expects to be successful on the Web.

New content absolutely must degrade gracefully in legacy browsers. There are some 600,000,000 people online: they won't all upgrade immediately! NUA

Similarly, new features absolutely must be compatible with existing content: we can't add a new feature if implementing it would break CNN.com's front page, for instance. There are over four billion Web pages, by a conservative estimate; Web browsers can't afford to stop rendering them correctly. GOOGLE

Whatever new technologies are introduced have to keep working with old content. Authors need to be able to continue using their old content, and need to be able to use the new features in those documents. We simply can't ask millions of authors to start providing two versions of their documents, or to port their documents to a new language just to start using a new feature.

Finally, we also have to bear in mind that Web browsers are receiving more and more accusations of being bloated: the alphabet soup of standards that browsers have to support simply can't grow forever, and old technologies can't be dropped while there is still a significant userbase. As an example: Gopher support was common in browsers for at least a decade after the Web replaced Gopher as the standard way of sharing documents. Some browsers still support it today! Adding new languages is therefore quite expensive.

In short, abandoning the existing deployed userbase and creating a new language is simply not an option.

Easy and well defined

To make sure these specifications don't suffer from the problems mentioned earlier, two other design principles are being used.

Status of the WHATWG

The first proposed spec, Web Forms 2.0, is basically done. We hope to enter the call for implementation phase in the coming months or even weeks. The W3C recently acknowledged the submission of a draft of the Web Forms 2.0 proposal.

The second proposed spec, the wholesale update of HTML4 and DOM2 HTML, is in active development.

The W3C is obviously the right place for Web standards. The WHATWG members hope that both specifications will eventually make it onto the W3C REC track, so that they can benefit from more in-depth review from other W3C members who may not be interested in reviewing the draft when it is just an unofficial proposal.

Examples

Here are some examples of features from the WHATWG proposals.

Making a form control a required form control

Web pages frequently have forms with required fields. Here's an example of the markup required to do that today:

<form ... onsubmit="return checkform(this)">
 <p><label>E-mail address: <input name="email"></label></p>
 ...
</form>
<script type="text/javascript">
 function checkform(form) {
   if (form.email.value == "") {
     alert("Please make sure you have filled the required fields.");
     return false;
   } else {
     return true;
   }
 }
</script>

Here's what the same form would look like using the features proposed in Web Forms 2.0:

<form ...>
 <p><label>E-mail address: <input name="email" required></label></p>
 ...
</form>

Now the browser takes care of everything. It can alert the user that the field is required before the form is submitted, and when the form is submitted if the user ignores the warning; it can indicate the required field stylistically, and so forth.

The same form can simply be taken and simplified to this, with no other changes to the document. The form will continue working with old browsers, too, although the error checking in those browsers would only be done server-side.

Making a form control only accept numbers

Sometimes forms have controls that expect specific data formats, not just plain text. Today, such a form has to be written up with script:

<form ... onsubmit="return checkform(this)">
 <p><label>Number of tickets:
       <input name="tickets"></label></p>
 ...
</form>
<script type="text/javascript">
 function checkform(form) {
   var x = parseInt(form.tickets.value);
   if (form.tickets.value != "" && (isNaN(x) || x < 0 || x > 100)) {
     alert("The number must be between 0 and 100 inclusive.");
     return false;
   } else {
     return true;
   }
 }
</script>

Here's what the same form would look like using the features proposed in Web Forms 2.0:

<form ...>
 <p><label>Number of tickets:
       <input name="tickets" type="number" min="0" max="100"></label></p>
 ...
</form>

Elements for common semantics

Another aspect that is made better by the HTML5 proposals is semantics. Currently, typical pages look like this:

<body>
  <div class="header">
   ...
  </div>
  <div class="navigation">
   ...
  </div>
  <div class="article">
   ...
  </div>
  <div class="sidebar">
   ...
  </div>
  <div class="footer">
   ...
  </div>
</body>

Such a document is completely free of useful semantics. User agents have no way of knowing what is the navigation bar, what is the main body, and what is an aside.

With the new proposed elements in the HTML5 draft, that document becomes:

<body>
  <header>
   ...
  </header>
  <nav>
   ...
  </nav>
  <article>
   ...
  </article>
  <aside>
   ...
  </aside>
  <footer>
   ...
  </footer>
</body>

Now the browser can immediately jump to the content, jump to the navigation, and so forth. Easy to understand, easy to implement, backwards compatible with older Web browsers and easy to add to old content.

Conclusions

HTML4 is in need of an upgrade, and that upgrade must cater for the billions of lines of existing markup. The WHATWG, an open group of contributors using a public mailing list, is addressing these needs by making proposals for an incremental update to HTML4 and the DOM.

Bibliography

[ACID2] The Second Acid Test
Web Standards Project, 2005
[CSS21] Cascading Style Sheets, level 2 revision 1
W3C, 2004
[WF2] Web Forms 2.0
WHATWG, 2005
[WA1] Web Applications 1.0
WHATWG, 2005
[NUA] Nua Internet How Many Online
2002
[GOOGLE] Google home page
2005

Biography

Ian Hickson

Standards Development, Opera Software http://www.opera.com/

Ian Hickson is a standards compliance geek who has worked in quality assurance and research and development with Netscape and Opera, and who has been actively involved in the W3C for several years.