
\documentclass[12pt]{article}
\usepackage[T2A,OT1]{fontenc}
\usepackage[default]{cantarell}
\usepackage[a4paper, top=20mm, bottom=20mm, left=20mm, right=20mm]{geometry}
\usepackage[utf8]{inputenc}
\usepackage[russian, english]{babel}
\usepackage{tabu}
\usepackage{hyperref}
\usepackage{parskip}
\usepackage{graphicx}
\usepackage{tabularx}
\usepackage{float}
\floatstyle{boxed}
\restylefloat{figure}
\usepackage{setspace}
\onehalfspacing
\author{Artyom Bologov \href{mailto:pidgin@aartaka.me}{(email)}}
\date{\today}
\title{Pidgin Markup For Writing, or How Much Can HTML Sustain?}
\makeatletter
\def\endenv{\expandafter\end\expandafter{\@currenvir}}
\makeatother
\begin{document}
\maketitle

\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{./assets/pidgin.png}

\href{run:hypertext}{I am an HTML extremist}.
There’s this argument I often hear: “But HTML is cumbersome to write!”
\href{run:falsehoods-html}{Wrong, as many other myths about HTML}.
In this post, I’m detailing how I write my dialect of HTML and why it’s easy.

This Pidgin HTML dialect is intimately tied to my website setup:
\href{run:this-post-is-ed}{I’m using ed(1) as my Static Site Generator}.
Which means the setup is:

\begin{enumerate}\item I write my Pidgin HTML in \verb|.htm| files.
    It’s a standard HTML extension, but it’s useful to differentiate from the resulting \verb|.html| for build purposes.
\item I preprocess SSI-like comments and include files.
\item I convert overly smart and (slightly) weird tags into valid HTML.
\item And then I send the result to
  \href{run:codeberg-pages}{Another Person’s Computer™}
  for you to see on aartaka.me.
\end{enumerate}

This setup is no rocket science, doing sloppy tags, regex substitutions, and shell scripts.
And yet it results in a dialect of HTML that’s easy/er to write.
And almost standard/conventional.
Here’s a set of engines I’m testing my markup with:

\begin{itemize}\item \href{https://dillo.org}{Dillo}
\item \href{https://w3m.sourceforge.net}{w3m}
\item Firefox/LibreWolf
\item (Ungoogled) Chromium
\item WebKitGTK browser engine
\item \href{https://github.com/Shinmera/plump}{Plump}, an HTML parser for Lisp.
\end{itemize}

So yeah, it does work everywhere, even as the raw source form.

You can see the initial Pidgin HTML for this post rendered at
\href{pidgin.htm}{pidgin.htm}, and
\href{pidgin.htm download}{download it to preview the text behind it}.
\href{https://developer.mozilla.org/en-US/docs/Web/HTML/Guides/Quirks_mode_and_standards_mode}{On to the quirks then}!

\section*{Smart tags} \label{smart}

\href{https://merveilles.town/@aartaka/115669971371343412}{HTML links are slightly painful to type out}.
So I made my own syntax, knowing that my build regex will expand it for me:

\begin{figure}[h!]\begin{verbatim}
<a \href{run:just}{just.htm}>local link to other post</a>
<a //something.com>\href{https://url.spec.whatwg.org/#scheme-relative-special-url-string}{Scheme-relative link}</a>
<a \href{https://aartaka.me}{aartaka.me}>Arbitrary link without https:// formalities</a>
\end{verbatim}\caption{Simplified links}\end{figure}

There’s a bunch of other tags, like \verb|<c>word| for simpler \verb|code| tags.
And shorter \verb|<img assets/pidgin.png Image of a silly pigeon drawing...>|.
Not much, but more than enough for comfortable writing.
Now to the more grand things.

\section*{Implied End Tags} \label{implied}

I think this is the heritage of sloppy HTML 1/2/3 programming.
You know, with shortcuts following authors’ practice.
The primordial chaos reigning in the early Internet.

Long story short, you don’t have to close \verb|<p>|.
Its end tag is implied and closed whenever e.g. the next opening \verb|<p>|aragraph tag is encountered.
\href{https://html.spec.whatwg.org/#closing-elements-that-have-implied-end-tags}{Spec: li, dt, dd, td, and some others also work this way}.
This is valid processable HTML:

\begin{figure}[h!]\begin{verbatim}
<p>
hello

<p>
another hello

<ul> one
<li> two
<li> wait, it’s not an ordered list?
</ul>
\end{verbatim}\caption{Implied end tags in action}\end{figure}

\paragraph{this ul...} \begin{quote}
You may’ve noticed the \verb|<ul>| followed by text.
This is another of my shortcuts.
It looks alright (indented) when opened from source \verb|.htm| file.
It’s easy to type.
And it’s a simple substitution away from a valid \verb|<ul> <li>| list.
\end{quote}

Having these, I can generate sensible HTML with ed(1) scripts.
While having the convenience of shortcuts in standard HTML.
All according to the spec!

\section*{Closing Tag Space} \label{closing}

Here’s a fun one: HTML spec forbids putting anything into closing tags (anything after \verb|</tag|).
Yet… everyone parses it just fine.
So I can e.g. put things into the epilogue of a \verb|pre| tag as a caption for the code block:

\begin{figure}[h!]\begin{verbatim}
<pre html>
<!-- ... -->
</pre Example of closing tag captions>
\end{verbatim}\caption{Example of closing tag captions}\end{figure}

I mean, it’s not a caption per se.
And it renders as plain \verb|<pre>| when previewed in \verb|.htm| source.
But once I process it with my build scripts, it expands to a proper \verb|<figcaption>|.
Without the need to write all the formalities out myself.

\subsection*{Arbitrary Attributes} \label{attributes}

A thing partially related to closing tag contents: HTML allows anything as attributes.
This previous sentence might work well as an attribute set:

\begin{figure}[h!]\begin{verbatim}
<span HTML allows anything as attributes>...</span>
<!-- turns into -->
<span HTML="" allows="" anything="" as="" attributes="">...</span>
\end{verbatim}\caption{Anything you put into tags is attribute, actually}\end{figure}

I’m using that for e.g. tables and \verb|<details>|:


\begin{figure}[h!]\begin{verbatim}
<details this ul...>
You may’ve noticed the ...
</details>
\end{verbatim}\caption{Effective leading tag space use}\end{figure}

And I’m using \verb|<h2 id>| as a shortcut for headings with IDs (\verb|<h2 id=id>|).

\section*{Magic IE Comments and Server Side Includes} \label{comments}

\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{./assets/pidgin-comment.png}

In case you ever opened the inspector
\href{https://mozilla.org}{on some major site},
you might’ve seen these \verb|[if IE]| comments.
Basically Internet Explorer specific comments that only IE evaluates.
Other browsers perceive them as mere comments.

\begin{figure}[h!]\begin{verbatim}
<!--[if IE 9]>
    <script src="https://www.mozilla.org/media/js/lib-ie.cf16e08599c3.js"></script>
<![endif]-->
\end{verbatim}\caption{Example IE conditional comment from mozilla.org}\end{figure}

These are useless in the modern post-IE world.
That’s why I’m exploiting them to generate format-specific content.
Say, making a link footer when generating Gemtext from this Pidgin HTML:
(I've since removed Gemtext backend from this site, unfortunately. Gemtext is too primitive.)

\begin{figure}[h!]\begin{verbatim}
<!--[if GMI]>
=> index.gmi Back to home page
=> about.gmi About & Contacts
=> uses.gmi Tech I Use
=> projects.gmi My projects
&lt![endif]-->
\end{verbatim}\caption{GMI-specific content via IE comment}\end{figure}

Another point of reference might be
\href{https://wikipedia.org/wiki/Server_Side_Includes}{Server Side Includes}.
Initially Apache-specific format of commands embedded into HTML (or, rather .shtml) pages.
Allowing file inclusion, conditional expansion, shell/CGI command execution etc.
A much needed logic-ful HTML extension.

So I’m using SSI as an inspiration, making my own \verb|#include| and \verb|#exec| directives.

\begin{figure}[h!]\begin{verbatim}
<include template/footer>
as a shorthand for
<!--#include file="template/footer" -->
\end{verbatim}\caption{SSI-like inclusion command}\end{figure}

This is recognized as a mere starting tag.
Or, in case of the SSI version, as a comment.
Harmless.

\section*{Pidgin HTML is Still HTML} \label{still-html}

Let this post be a praise to

\begin{itemize}\item shortcuts of HTML,
\item its simple user-facing nature,
\item and the universality and power of Web Platform.
\end{itemize}

Pidgin HTML is made possible by sloppiness of the standard/quirks HTML.
It’s still valid HTML, but one that is much easier to write.
Good for authoring as a hypertext alternative to Markdown.
And other Lightweight Markup Languages.

Don’t be afraid of HTML.
Write Pidgin HTML.


\par\noindent\rule{\textwidth}{0.4pt}
\href{https://creativecommons.org/licenses/by/4.0}{CC-BY 4.0} 2022-2026 by Artyom Bologov (aartaka,)
\href{https://codeberg.org/aartaka/pages/commit/a91befa}{with one commit remixing Claude-generated code}.
Any and all opinions listed here are my own and not representative of my employers; future, past and present.
\end{document}
