Write Hypertext, not Plaintext

I visit Derek Sivers’ website from time to time. Every time I do, I discover he had

written some half a hundred new blog posts,
moved to another place,
and started a new diet/philosophy/routine/book.

He’s too productive for a human, but that’s a topic for a different time. The topic for today is his choice of data formats.

All You Need is Plaintext #

One of the posts that started me the most was Derek’s Write Plain Text Files. Storing all of one’s life, writing, and data in plaintext files is too appealing to ignore:

They are the most reliable, flexible, and long-lasting option.

Derek Sivers, Write Plain Text Files

The problem is: Derek’s are not really plaintext files. He invents his own metadata headers. Date, tags, and title—pieces of information that don’t belong to text-only files. Ad-hoc extension to the headers-and-paragraphs structure everyone does in plaintext. Waaaaaait, the headers-and-paragraphs structure is ad-hoc too!

Plaintext is not enough. The moment you start adding metadata, links, lists, and quotations—you’ve transgressed. It’s richtext now. The worst kind of richtext:

non-portable,
unstandardized,
and likely inconsistent across time.

The perfect plaintext file

The most readable plaintext file that I’ve ever seen is this define-syntax tutorial. But even there, the syntax, though perfectly readable, is author-specific and non-portable. It’d benefit from being HTML instead, but oh well.

The Plaintext is a Lie #

I have to clarify: there are actually two types of “plaintext” we’re talking about here. Both Derek Sivers and Scott Nesbit make the mistake of conflating these types. What plaintext devotees usually mean is a storage format, as in binary-versus-plaintext. That’s the first part of the MIME type: text/*.

Then, there’s a second part of MIME: subtype. That’s where it gets ugly (citing Scott Nesbit):

Believe it or not, plain text is used everywhere. Even when you don’t see it. Where? In the source code for software, web pages, blog posts and articles (like this one), configuration files on your computer, and more.

Scott Nesbit, Why Plain Text Matters

And then:

I can, for example, open a text file that I created in 1991 using any modern text editor — whether on my desktop or on the web or on my smartphone. All the information is there, and I don’t lose any formatting.

Scott Nesbit, ditto

Plaintext is code, web page sources, configuration files, etc... And you can also open and preview it all in your editor? This blog post you’re reading is a pure horror when compiled from Lisp to HTML ( with C preprocessor, actually. And it’s pretty now!) The code that supports it is quite unreadable because I didn’t really care about the style of the code hosting my blog. And then, this supporting code is Lisp, which causes nausea for the programmers used to their slightly-incompatible C-like syntax language.

There are really two kinds of plaintext. Plaintext-the-storage-type and plaintext-the-display/markup/software-subtype. The former is eternal, and the latter is as ephemeral as the software/human processing it.

That’s why I’m taking an “anti-plaintext” stance here. You must use plaintext as a storage type, but, more important than that, you have to pick the most reliable subtype. Otherwise, all your “plaintext” files from 1991 get too ugly to understand 😛

Screw .TXT, All My Homies Use Markdown #

So plaintext is not enough, and one needs a structured and metadata-aware text format. Most of my fellow programmers realize that at some point. They usually convert to Markdown soon afterward.

Markdown is the de facto standard for Zettelkasten notes, blog posts, and project docs. Obsidian supports markdown, GitHub supports it, VSCode supports it—everything supports it. It has headings, lists, links, metadata headers, HTML injection, and other platform-specific incompatible goodies. And it looks nice when rendered!

One can argue for a particular richtext format, like Org Mode possibly with Orgdown, YAML, or Wiki, each with its benefits. But, essentially, everyone is okay with richtext in whatever format they have. Obsidian, Roam, Brain, and other knowledge systems make these atomic richtext files interconnected. They bring structure and hierarchy to them. One’s Markdown files are a self-sufficient knowledge web now. But, for a paranoid like me, these systems are no consolation.

Knowledge Webs. Better ones. #

Obsidian will end. Myspace, Google Reader, and dozen other vital tech products did. Once it’s gone—your knowledge web is at best scattered, at worst destroyed. Markdown files you have are subtly incompatible with the new knowledge system. Your reference system is lost.

One of the social tech-agnostic solutions (because social solutions are superior to technical ones) would be to pick another, more reliable, referencing system. Like academic one—with a dozen metadata fields unique to the given paper/post/media. It’s been there for an eternity, and will likely last—the academy is not letting reliable systems go. And there are technical solutions to keep your references clean and consistent. Like Zotero and Citation Machine.

But, even with these citation and referencing tools, keeping your knowledge base up to date is hard. You have to update references, load files, generate links, and interface with the document/richtext editor of choice. Would be nice to have something with (back-)linking, formatting, and plaintext-like data persistence. Something like...

Hypertext #

The academy made a live-changing gift to modern civilization in the nineties. HTTP, URI, and HTML are simple yet reliable foundations for the modern Internet. For a reason:

HTML has its roots in the academy with its semantic markup and referencing fetishes.
HTTP is plaintext in the most reliable meaning of plaintext
and URI is a system for unique, linkable, and readable data references.

HTML was (and still is) intended to replace printouts, libraries, and opaque PDFs. CSS follows suit. <a> tag and a href=URI give one the full power of referencing unique data, in the minimum amount of characters possible. Tables, lists, paragraphs, and other structural tags (especially in HTML5) cover all the possible needs of a writer, followed by citations, inline and block/multiline quotations, abbreviations, dates, addresses, and whatnot.

I’m getting stoned every time I open HTML and CSS references on MDN or WHATWG. Seems like they anticipated every single use-case for semantic pages on the Web. HTML is Markdown but with all the possible metadata one needs.

If you’ve read this far, you’ve probably seen the silcrow signs near headings and pilcrow signs near paragraphs. These link to the sections and paragraphs they precede I’ve reinvented Bible-like linking, and I’ve done it in the only system that was flexible enough to do that: HTML.

Update Dec 2023: I removed the pilcrow signs, because they were too noisy and hard to re-implement in C preprocessor.

Update Mar 2024: Pilcrow signs are back!

Your data is safe in HTML because it’s still plaintext. Your data is portable in HTML (even if it’s my ugly sort of HTML) because it’s not an ad-hoc plaintext extension. Your data is pretty in HTML because HTML+CSS is a plaintext format intended for display (even if it’s an audio “display”, ahem.) Your data is meta-enriched in HTML because there’s a tag, <meta> header, or attribute for any type of metadata you can imagine. Your data is a knowledge web in HTML because it’s all interlinked and machine-parseable. By default. Forever.

Write Hypertext Knowledge Webs, Not Plain Text Files.

Interlude on Tools #

Update Jan 2024: I’m saying “portable” and “pretty”, because hypertext as an idea and HTML as an implementation are defined as:

Hypertext is text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access

Definition of hypertext from Wikipedia

Other markup languages are not necessarily intended for immediate display. Neither they are implied to have links. Which is why I’m talking about hypertext and HTML in particular: links and metadata are built in. All the HTML-related tooling has to support them. And browsers, these glorified HTML viewers, are supposed to support a lot of legacy features (quirks). Which all makes HTML both easier to write (look at the sources of this page) and reliable to view/transfer down the line. Unlike most other plaintext markup/whatever formats.

Gemtext #

I’m interacting with a lot of Gemtext lately, and I can see its appeal. It’s simple. Focused. Essentials only.

But it lacks a lot. Overabuse of preformatted text elements is a standard practice. There are no tables, images (not in the default package, at least), forms etc. There’s no feedback loop with the user. There are no established ways to transmit meta-information.

And, most importantly, Gemtext is a markup language, not a Hypertext format. It’s not intended for interaction. It’s not suitable for semantic knowledge webs. So use Hypertext/Web Platform/HTML and live a happy life.