CSS-only Syntax Highlighting

By Artyom Bologov A bright white hand-written thumbnail with

So I started posting some pretty intimidating pieces of code on this blog. And I also became a part of EVIL-using Smartparens-reliant angry-fruit-salad-addicted bunch. (At my $ork, I'm still an Emacs/Paredit/Monochrome lover deep inside.) So it's only consequential that I can no longer look at my blog. It's too dull, too monotonous. Too colorless.

I need syntax highlighting! But I promised that my posts will use no JS. Unless otherwise noted. Meaning: almost never. Can I somehow do syntax highlighting without JS?

Preprocessing

So I'm generating my blog with ed(1) Which means that I have an intermediate step of HTML compilation. Not really convenient (it takes a whole tenth of a second!) But it's worth the wait to replace all the templates and variables and syntax. So why not add one more thing to this pipeline?

The simplest way to do highlighting is adding some inline tags, like <code> or <b>:

s/[[:alnum:]!-'*+.:-@^-`-]\{1,\}/<b>&<\/b>
Wrapping tokens in b tags (do not ask me where that regex came from)

A good and simple approach, but why not go further? Like using something beyond default element styles?

Prior art: CSS classes

Looking for JS-less syntax highlighting, I found several options for CSS. Adding some span-s and classes to code blocks to add highlighting. Like this thing, for example. All cool, but

<code class="syntaxbox">
  <span class="newline">
    <span class="identifier">code
      <span class="attribute">class</span>
      <span class="value">syntaxbox</span></span>
    </span>
    <span class="newline"></span>
    <span class="newline" data-level='1'>
      <span class="comment">HTML Comment</span>
    </span>
    <span class="newline" data-level='1'>
      <span class="identifier">a
        <span class="attribute">href</span>
        <span class="value link">https://www.cssscript.com</span>
      </span>CSSScript.com
      <span class="identifier end">a</span>
    </span>
    <span class="newline"></span>
    ....
</code>
No, I do not want that, thanks

I mean, I could auto-generate these classes, but

Is there maybe a sloppier solution that unifies both preprocessing simplicity and CSS flexibility?

Preprocessing... and CSS

It would be nice if CSS matched against element text. Imagine how easy it would be to match keywords with that! No need for these ugly classes around every token.

But CSS does no such thing, for better or worse. It only matches against elements, classes, IDs, and attributes. Wait...

That's my hack: putting a duplicate of the code token into an attribute. (Remember preprocessing?)

g/<pre/.+1,/<\/pre/-1s/\([[:alnum:]!-'*+.:-@^-`-]\{1,\}\)/<span token='\2'>\2<\/span>/g
Creating simple span-s for later styling.

And then highlighting the tokens with the matching attributes.

pre [token$="let"],
pre [token^="let"],
pre [token*="def"],
pre [token="when"],
/* ... */
pre [token="class"]
{
    font-weight: 600;
    color: var(--accent);
}
Huge set of highlighted special cases

So meta: rule for def matches itself due to being overly greedy. I don't hide this deficiency of my algo from you, because why should I? It's supposed to be a flawed heuristic hack for JS haters like me.

On matching keywords

A frequent thing in many languages is keyword arguments/functions. Usually marked by colons (for whatever reason.) I handle these separately, in a less intimidating form.

pre [token^=":"],
pre [token^="#:"],
pre [token$=":"]
{
    color: var(--accent);
}
Less intimidating highlighting for keywords

This highlights JSON, CSS, and Lisp keywords:

{
	"key": 3,
	"lisp": "(write 3 :stream t :pretty t)"
}
Some JSON

But that's an aside.

And that's it really! Create span-s (or code, tt, or whatever that doesn't mess up page semantics terribly) with token attributes and match these attributes.

Here's a more serious example for you:

(defun question-reader (stream char arg)
  "Provide documentation/help for the form following #?.
Depends on the form:
- KEYWORD: `apropos' for the keyword name.
- (KEYWORD PACKAGE): `apropos' in PACKAGE.
- SYMBOL: Print argument list and argument types for SYMBOL-named
  function.
- (SYMBOL TYPE): Print the `documentation' for TYPEd SYMBOL."
  (declare (ignorable char arg))
  (let ((val (read stream nil nil t))
        (*print-case* :downcase))
    (cl-user::with-useful-printing
      (typecase val
        ((or keyword string
             (cons (or keyword string)))
         (apropos (first (uiop:ensure-list val))
                  (second (uiop:ensure-list val))))
        ((and symbol
              (satisfies fboundp))
         (format t "~&~a~%~a -> ~a"
                 (trivial-arguments:arglist val)
                 (nth-value 0 (trivial-arguments:argtypes val))
                 (nth-value 1 (trivial-arguments:argtypes val))))
        (symbol
         (format t "~&~a = ~a" val (symbol-value val)))
        (list (format t "~&~a" (documentation (first val) (or (second val) 'function))))))
    (terpri)
    (values)))
A bigger Lisp code piece showcasing the highlighter

Go and purge a JS syntax highlighting library from your website. Now!

P.S. You can find the script I'm using at scripts/tohighlight.ed. Finding the stylesheet with keyword matching is left as an (easy!) exercise for the reader.

Leave feedback! (via email)