Falsehoods Programmers Believe About HTML

Oh, I see you disabled JavaScript. Keep up the good work, my fellow cleanweb person!

Notice that there might be trace amounts of JS, used for:

Runnable JS code blocks
Prettier email feedback form
Random shuffling of “thought” cards in some posts.

JS is not required for use of the website though, it’s only enhancing the existing functionality.

Web is beautiful. Web is ugly. Web is astonishing. A part of this appeal is HTML, with its historical quirks. Many a programmer believe many things about HTML. And some of the beliefs are not necessarily true. So let’s explore some falsehoods programmers believe about HTML.

Language & Parsing #

HTML is just XML. All tags have matching closing tags.

Some tags (like <li> or <p>) have implicit closing tags:

<li> List item without closing tag
<li> Another list item right after it

Example of implicit closing tags in <li> tag

HTML is almost XML. All tags have closing tags, even if implicit

<img> and <input> are self-closing:

<!-- Notice the / here!-->
<input type=text/>

Self-closing <input> tag

Okay, okay, HTML is not XML. But all elements either have closing tags or self-close

<br> and <hr> don’t even need a self-close slash.

Actually, self-close slash is mostly optional (and discouraged) in HTML, so the difference is less pronounced.

Standard #

HTML is defined by the standard

It’s defined by browser vendors and WHATWG (= browser vendors)

The standard does not change after validation

The standard is "Living", and you can see (a very recent) date of last change at Living Standard page.

The standard is self-contained (relating to HTML only)

HTML is also relating to a group of standards, including DOM and JavaScript. In fact, many features of HTML are defined as JavaScript classes.

There is only one (two? three?) doctypes for HTML documents

Oh my sweet summer child...

<!DOCTYPE html>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Level 1//EN">
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Level 2//EN">
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN">
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">
<!DOCTYPE math SYSTEM "http://www.w3.org/Math/DTD/mathml1/mathml.dtd">

Many (not all) HTML doctypes

Practices #

All websites follow the standard.
No one uses HTML4.
No one uses HTML3 anymore.
No one uses HTML2 anymore.
No one uses HTML1 anymore.
No one uses tables for markup anymore.

No one uses XHTML

ePub, a widespread ebook format, uses XHTML for content markup. It sucks, but it’s a practice.

ePub 3.3 uses HTML5, but still.

Runtime #

Modifying DOM is slow

React propaganda is probably to blame for this illusion. DOM is the most optimized data structure out there. Whatever you put in it—it’ll sustain. React will not.

Browsers are just messy HTML parsers

Browsers are JS evaluators. Browsers are layout engines. Browsers are computer graphics toolkits (WebGL and fonts). Browsers are OSes (they have file system interfaces, audio output, and many other APIs). Browsers are interfaces to privacy-leaking ventor APIs.

SEO is hard and you need frameworks for it

Not really if you write simple semantic HTML. Because it’s easy to parse and index, especially compared to JS-generated markup.

WebAssembly will deprecate HTML and JS

These are different niches. You can’t really make accessible websites with WebAssembly. So if you want universal pages openable everywhere, you have to stick with HTML etc.

HTML is not Turing-complete

It is, given CSS and user input.

Did I Forget Any? #

In case you haven’t found your favorite falsehood, feel free to suggest more! This post will likely be on Reddit and Hacker News, so use comments there. Or use the contacts from the About page!