Pidgin Markup For Writing, or How Much Can HTML Sustain?

HTML is flexible and was shaped by generations of web practitioners. It has enough tricks up its sleeve to actually be nice to author. Here are some. assets/pidgin.png Image of a silly pigeon drawing. Its head has a “<” sign beak and a “/>” (see the HTML reference?) plumage. On its body, “HTML” is written in bright letters. Along its tails, “AARTAKA.ME” is written as if continuing it. IMAGE_ALT

I am an HTML extremist. There’s this argument I often hear: “But HTML is cumbersome to write!” Wrong, as many other myths about HTML. In this post, I’m detailing how I write my dialect of HTML and why it’s easy.

This Pidgin HTML dialect is intimately tied to my website setup: I’m using ed(1) as my Static Site Generator. Which means the setup is:

.htm

.html

I preprocess SSI-like comments and include files.
I convert overly smart and (slightly) weird tags into valid HTML.
And then I send the result to Another Person’s Computer™ for you to see on aartaka.me.

This setup is no rocket science, doing sloppy tags, regex substitutions, and shell scripts. And yet it results in a dialect of HTML that’s easy/er to write. And almost standard/conventional. Here’s a set of engines I’m testing my markup with:

Dillo

w3m
Firefox/LibreWolf
(Ungoogled) Chromium
WebKitGTK browser engine
Plump, an HTML parser for Lisp.

So yeah, it does work everywhere, even as the raw source form.

You can see the initial Pidgin HTML for this post rendered at pidgin.htm, and download it to preview the text behind it. On to the quirks then!

Smart tags

HTML links are slightly painful to type out. So I made my own syntax, knowing that my build regex will expand it for me:

<a just.htm>local link to other post</a>
<a //something.com>Scheme-relative link</a>
<a aartaka.me>Arbitrary link without https:// formalities</a>

There’s a bunch of other tags, like <c>word for simpler code tags. And shorter <img assets/pidgin.png Image of a silly pigeon drawing...>. Not much, but more than enough for comfortable writing. Now to the more grand things.

Implied End Tags

I think this is the heritage of sloppy HTML 1/2/3 programming. You know, with shortcuts following authors’ practice. The primordial chaos reigning in the early Internet.

Long story short, you don’t have to close <p>. Its end tag is implied and closed whenever e.g. the next opening <p>aragraph tag is encountered. Spec: li, dt, dd, td, and some others also work this way. This is valid processable HTML:

<p>
hello

<p>
another hello

<ul> one
<li> two
<li> wait, it’s not an ordered list?
</ul>

You may’ve noticed the <ul> followed by text. This is another of my shortcuts. It looks alright (indented) when opened from source .htm file. It’s easy to type. And it’s a simple substitution away from a valid <ul> <li> list.

Having these, I can generate sensible HTML with ed(1) scripts. While having the convenience of shortcuts in standard HTML. All according to the spec!

Closing Tag Space

Here’s a fun one: HTML spec forbids putting anything into closing tags (anything after </tag). Yet… everyone parses it just fine. So I can e.g. put things into the epilogue of a pre tag as a caption for the code block:

<pre html>
<!-- ... -->
</pre Example of closing tag captions>

I mean, it’s not a caption per se. And it renders as plain <pre> when previewed in .htm source. But once I process it with my build scripts, it expands to a proper <figcaption>. Without the need to write all the formalities out myself.

Arbitrary Attributes

A thing partially related to closing tag contents: HTML allows anything as attributes. This previous sentence might work well as an attribute set:

<span HTML allows anything as attributes>...</span>
<!-- turns into -->
<span HTML="" allows="" anything="" as="" attributes="">...</span>

I’m using that for e.g. tables and <details>:

<details this ul...>
You may’ve noticed the ...
</details>

And I’m using <h2 id> as a shortcut for headings with IDs (<h2 id=id>).

Magic IE Comments and Server Side Includes

A line of text. On it, highlighted IE comment “[if IE]...[endif]” is added, and inside it, “oh, don’t mind me” is written. Playing with the fact that most browsers ignore IE comments.

In case you ever opened the inspector on some major site, you might’ve seen these [if IE] comments. Basically Internet Explorer specific comments that only IE evaluates. Other browsers perceive them as mere comments.

<!--[if IE 9]>
    <script src="https://www.mozilla.org/media/js/lib-ie.cf16e08599c3.js"></script>
<![endif]-->

These are useless in the modern post-IE world. That’s why I’m exploiting them to generate format-specific content. Say, making a link footer when generating Gemtext from this Pidgin HTML: (I've since removed Gemtext backend from this site, unfortunately. Gemtext is too primitive.)

<!--[if GMI]>
=> index.gmi Back to home page
=> about.gmi About & Contacts
=> uses.gmi Tech I Use
=> projects.gmi My projects
<![endif]-->

Another point of reference might be Server Side Includes. Initially Apache-specific format of commands embedded into HTML (or, rather .shtml) pages. Allowing file inclusion, conditional expansion, shell/CGI command execution etc. A much needed logic-ful HTML extension.

So I’m using SSI as an inspiration, making my own #include and #exec directives.

<include template/footer>
as a shorthand for
<!--#include file="template/footer" -->

This is recognized as a mere starting tag. Or, in case of the SSI version, as a comment. Harmless.

Pidgin HTML is Still HTML

Let this post be a praise to

its simple user-facing nature,
and the universality and power of Web Platform.

Pidgin HTML is made possible by sloppiness of the standard/quirks HTML. It’s still valid HTML, but one that is much easier to write. Good for authoring as a hypertext alternative to Markdown. And other Lightweight Markup Languages.

Don’t be afraid of HTML. Write Pidgin HTML.