CSS-only Syntax Highlighting
By Artyom Bologov
So I started posting some pretty intimidating pieces of code on this blog. And I also became a part of EVIL-using Smartparens-reliant angry-fruit-salad-addicted bunch. (At my $ork, I'm still an Emacs/Paredit/Monochrome lover deep inside.) So it's only consequential that I can no longer look at my blog. It's too dull, too monotonous. Too colorless.
I need syntax highlighting! But I promised that my posts will use no JS. Unless otherwise noted. Meaning: almost never. Can I somehow do syntax highlighting without JS?
Preprocessing
So I'm generating my blog with ed(1) Which means that I have an intermediate step of HTML compilation. Not really convenient (it takes a whole tenth of a second!) But it's worth the wait to replace all the templates and variables and syntax. So why not add one more thing to this pipeline?
The simplest way to do highlighting is adding some inline tags,
like <code>
or <b>
:
s/[[:alnum:]!-'*+.:-@^-`-]\{1,\}/<b>&<\/b>
A good and simple approach, but why not go further? Like using something beyond default element styles?
Prior art: CSS classes
Looking for JS-less syntax highlighting, I found several options for CSS.
Adding some span
-s and classes to code blocks to add highlighting.
Like this thing, for example.
All cool, but
<code class="syntaxbox">
<span class="newline">
<span class="identifier">code
<span class="attribute">class</span>
<span class="value">syntaxbox</span></span>
</span>
<span class="newline"></span>
<span class="newline" data-level='1'>
<span class="comment">HTML Comment</span>
</span>
<span class="newline" data-level='1'>
<span class="identifier">a
<span class="attribute">href</span>
<span class="value link">https://www.cssscript.com</span>
</span>CSSScript.com
<span class="identifier end">a</span>
</span>
<span class="newline"></span>
....
</code>
I mean, I could auto-generate these classes, but
- That'd make my highlighting script ten times longer than HTML generation one.
- I'd need to account for every language&token combination.
- I'd have to make arbitrary decisions about whether something is an identifier or an attribute or a variable or a class or a keyword or a string or a comment or a number or...
Is there maybe a sloppier solution that unifies both preprocessing simplicity and CSS flexibility?
Preprocessing... and CSS
It would be nice if CSS matched against element text. Imagine how easy it would be to match keywords with that! No need for these ugly classes around every token.
But CSS does no such thing, for better or worse. It only matches against elements, classes, IDs, and attributes. Wait...
That's my hack: putting a duplicate of the code token into an attribute.
(Remember preprocessing?)
g/<pre/.+1,/<\/pre/-1s/\([[:alnum:]!-'*+.:-@^-`-]\{1,\}\)/<span token='\2'>\2<\/span>/g
And then highlighting the tokens with the matching attributes.
pre [token$="let"],
pre [token^="let"],
pre [token*="def"],
pre [token="when"],
/* ... */
pre [token="class"]
{
font-weight: 600;
color: var(--accent);
}
So meta: rule for def
matches itself due to being overly greedy.
I don't hide this deficiency of my algo from you, because why should I?
It's supposed to be a flawed heuristic hack for JS haters like me.
On matching keywords
A frequent thing in many languages is keyword arguments/functions.
Usually marked by colons (for whatever reason.)
I handle these separately, in a less intimidating form.
pre [token^=":"],
pre [token^="#:"],
pre [token$=":"]
{
color: var(--accent);
}
This highlights JSON, CSS, and Lisp keywords:
{
"key": 3,
"lisp": "(write 3 :stream t :pretty t)"
}
But that's an aside.
And that's it really!
Create span
-s
(or code
, tt
, or whatever that doesn't mess up page semantics terribly)
with token attributes and match these attributes.
Here's a more serious example for you:
(defun question-reader (stream char arg)
"Provide documentation/help for the form following #?.
Depends on the form:
- KEYWORD: `apropos' for the keyword name.
- (KEYWORD PACKAGE): `apropos' in PACKAGE.
- SYMBOL: Print argument list and argument types for SYMBOL-named
function.
- (SYMBOL TYPE): Print the `documentation' for TYPEd SYMBOL."
(declare (ignorable char arg))
(let ((val (read stream nil nil t))
(*print-case* :downcase))
(cl-user::with-useful-printing
(typecase val
((or keyword string
(cons (or keyword string)))
(apropos (first (uiop:ensure-list val))
(second (uiop:ensure-list val))))
((and symbol
(satisfies fboundp))
(format t "~&~a~%~a -> ~a"
(trivial-arguments:arglist val)
(nth-value 0 (trivial-arguments:argtypes val))
(nth-value 1 (trivial-arguments:argtypes val))))
(symbol
(format t "~&~a = ~a" val (symbol-value val)))
(list (format t "~&~a" (documentation (first val) (or (second val) 'function))))))
(terpri)
(values)))
Go and purge a JS syntax highlighting library from your website. Now!
P.S. You can find the script I'm using at scripts/tohighlight.ed. Finding the stylesheet with keyword matching is left as an (easy!) exercise for the reader.