Pidgin Markup For Writing, or How Much Can HTML Sustain?
By Artyom Bologov
I am an HTML extremist. There’s this argument I often hear: “But HTML is cumbersome to write!” Wrong, as many other myths about HTML. In this post, I’m detailing how I write my dialect of HTML and why it’s easy.
This Pidgin HTML dialect is intimately tied to my website setup: I’m using ed(1) as my Static Site Generator. Which means the setup is:
- I write my Pidgin HTML in
.htmfiles. It’s a standard HTML extension, but it’s useful to differentiate from the resulting.htmlfor build purposes. - I preprocess SSI-like comments and include files.
- I convert overly smart and (slightly) non-standard tags into valid HTML.
- And then I send the result to Another Person’s Computer™ for you to see on aartaka.me.
This setup is no rocket science, doing sloppy tags, regex substitutions, and shell scripts. And yet it results in a dialect of HTML that’s easy/er to write. And almost standard/conventional. Here’s a set of engines I’m testing my markup with:
- Dillo
- w3m
- Firefox/LibreWolf
- (Ungoogled) Chromium
- WebKitGTK browser engine
- Plump, an HTML parser for Lisp.
So yeah, it does work everywhere, even as the raw source form.
You can see the initial Pidgin HTML for this post rendered at pidgin.htm, and download it to preview the text behind it. On to the quirks then!
Smart tags #
HTML links are slightly painful to type out. So I made my own syntax, knowing that my build regex will expand it for me:
<a just.htm>local link to other post</a> <a //something.com>Scheme-relative link</a> <a aartaka.me>Arbitrary link without https:// formalities</a>
There’s a bunch of other tags, like <c>word for simpler code tags.
And shorter <img assets/pidgin.png Image of a silly pigeon drawing...>.
Not much, but more than enough for comfortable writing.
Now to the more grand things.
Implied End Tags #
I think this is the heritage of sloppy HTML 1/2/3 programming. You know, with shortcuts following authors’ practice. The primordial chaos reigning in the early Internet.
Long story short, you don’t have to close <p>.
Its end tag is implied and closed whenever e.g. the next opening <p>aragraph tag is encountered.
Spec: li, dt, dd, td, and some others also work this way.
This is valid processable HTML:
<p>
hello
<p>
another hello
<ul> one
<li> two
<li> wait, it’s not an ordered list?
</ul>
this ul...
You may’ve noticed the <ul> followed by text.
This is another of my shortcuts.
It looks alright (indented) when opened from source .htm file.
It’s easy to type.
And it’s a simple substitution away from a valid <ul> <li> list.
Having these, I can generate sensible HTML with ed(1) scripts. While having the convenience of shortcuts in standard HTML. All according to the spec!
Closing Tag Space #
Here’s a fun one: HTML spec forbids putting anything into closing tags (anything after </tag).
Yet… everyone parses it just fine.
So I can e.g. put things into the epilogue of a pre tag as a caption for the code block:
<pre html>
<!-- ... -->
</pre Example of closing tag captions>
I mean, it’s not a caption per se.
And it renders as plain <pre> when previewed in .htm source.
But once I process it with my build scripts, it expands to a proper <figcaption>.
Without the need to write all the formalities out myself.
Arbitrary Attributes #
A thing partially related to closing tag contents: HTML allows anything as attributes. This previous sentence might work well as an attribute set:
<span HTML allows anything as attributes>...</span>
<!-- turns into -->
<span HTML="" allows="" anything="" as="" attributes="">...</span>
I’m using that for e.g. tables and <details>:
<details this ul...>
You may’ve noticed the ...
</details>
And I’m using <h2 id> as a shortcut for headings with IDs (<h2 id=id>).
Pidgin HTML is Still HTML #
Let this post be a praise to
- shortcuts of HTML,
- its simple user-facing nature,
- and the universality and power of Web Platform.
Pidgin HTML is made possible by sloppiness of the standard/quirks HTML. It’s still valid HTML, but one that is much easier to write. Good for authoring as a hypertext alternative to Markdown. And other Lightweight Markup Languages.
Don’t be afraid of HTML. Write Pidgin HTML.
Magic IE Comments and Server Side Includes #
In case you ever opened the inspector on some major site, you might’ve seen these
[if IE]comments. Basically Internet Explorer specific comments that only IE evaluates. Other browsers perceive them as mere comments.These are useless in the modern post-IE world. That’s why I’m exploiting them to generate format-specific content. Say, making a link footer when generating Gemtext from this Pidgin HTML:
Another point of reference might be Server Side Includes. Initially Apache-specific format of commands embedded into HTML (or, rather .shtml) pages. Allowing file inclusion, conditional expansion, shell/CGI command execution etc. A much needed logic-ful HTML extension.
So I’m using SSI as an inspiration, making my own
#includeand#execdirectives.This is recognized as a mere starting tag. Or, in case of the SSI version, as a comment. Harmless.