Designing for Exploitation: How Meta-Programming Leads to Safer Code
By Artyom BologovI've just finished watching a wonderful Weird Machines: Exploiting Turing-Completeness talk by Pedro Castilho. He was talking about accidental Turing-completeness and how computer technology is weird machines all the way down. This talk had security and program design takeaways that every programmer should remember. I'll cite just the two of these:
- Every program should have the least power possible in order to execute its function.
- [...] consider input for any program really as being a [...] programming language.
These sound pretty obvious, and yet every day there are dozens of RCEs revealed
in different software products. Why?
Because it's not rewarding to follow low-priority security department commandments,
and it's just too easy to copy the code from StackOverflow
or use JS eval()
To read a number from user input.
How does one fight it, then, if not with more security department commandments? With mindsets and tools. Pedro Castilho says using least power possible and considering input to be untrusted code? I say:
- You should use meta-programming abilities of your technology (I mostly mean programming languages there) as much as possible, and
- You should allow your users to influence your program by exposing programming languages to them.
But wait, this is like the two recommendations above, but inverted to make them meaninglessly dangerous, right? Yes... And no. Buckle up, I'm going to explain why designing for exploitation leads to a safer code.
Disclaimer: the meaning of meta-programming used in this post is not a typical meaning of meta-programming, as seen in communities like Rust and C++—a way to abstract some repetitive/generic code using a fancy built-in preprocessor,—and not even the one commonly used in Lisp community—a way to generate/alter the code running in the image, using the code itself
The meaning of meta-programming that you have to bear with for the duration of this post is: the ability of the programming language—and a program written in it—to access the parsing, evaluation, compilation, and other infrastructural functions that the programming language itself uses internally.
(read), eval(), #.(e(xp(l(o))it))!
First things first: you should use meta-programming abilities of your technology as much as possible.
There's a mantra in JavaScript community:
especially so for reading untrusted input.
But many developers still do it,
because eval()
is convenient for reading numerical or string inputs,
using the abilities of the JavaScript implementation at hand.
Programming language parser
is the most fool-proof and sophisticated number parser
you can find in this language's ecosystem, so why not use it?
I mean, using JS eval()
is still bad,
because you can feed anything into it
and it will happily evaluate it.
But if your programming language is more mindful
and sane about the meta-programming facilities it has,
then it's both safer and more convenient for you
to rely on these facilities rather than trying to re-implement them,
introducing accidental Turing-completeness in your code.
I'll exemplify this with Common Lisp and the way we use its meta-programming facilities in Nyxt code.
Disclaimer: I'm one of Nyxt browser maintainers working almost full-time on its maintenance. Be aware of that when I mention Nyxt.
We have a way to define your own pages integrated with Nyxt, called Internal Pages API. The way it works is:
- you write a Lisp function in your configuration file,
- you load this file inside Nyxt, and you have
-
- a new URL you can open in Nyxt, invoking
- a full-blown Lisp function with arguments routed through URL query arguments, and
- a Lisp object storing all the meta-information about those two.
describe-variable
,
a command that allows introspecting any variable we have in Nyxt:
While it may be non-obvious, we actually are passing raw Lisp symbols to this function
and use symbol-value
(function that finds the symbol value in the current environment by name),
processing its result and injecting the corresponding HTML into the page.
Is this possibly exploitable?
Well, maybe ¯\_(ツ)_/¯ Can you come up with a way to exploit these urls?
I believe in you.
But before you get to filing dozens of CVE-s
for every internal page in Nyxt, I'll spice it up:
we use Lisp compiler-native code parsing facilities
(namely, read
and read-from-string
)
on the internal page URLs.
If you know a thing or two about Common Lisp,
you'll immediately scream in terror, because you remember
But you also hold yourself together and refrain from screaming out loud,
because you'll remember that there's a small code snippet
(and a safe-read-from-string
function
abstracting it in a de-facto standard UIOP library)
that saves a lost cause:
This CL developer thought train is an example of effect
that using meta-programming has on a person:
you are aware of the power your language has, the dangers it poses,
and the ways to mitigate that.
It's not the opaque JS
eval()
that provides you with the unlimited power.
Common Lisp meta-programming allows you to limit the power it has.
Meta-programming, in this perspective, is a tool quite like loggers and database daemons, and it's as integratable in your design documents, as loggers and databases are.
And that's what I meant by using meta-programming abilities of your technology:
use the things provided by your language,
if it's at least a bit more self-aware than JavaScript.
You'll have lots of problems solved by the compiler/interpreter/transpiler
without the need to prove Greenspun's Tenth Rule yet again,
and your code will be safe from accidental eval()
injections 😃
I am aware that there are many languages that don't have meta-programming facilities exposed to the programmer. And I have no particular suggestions to the programmers in those. Except maybe moving to Lisp and enjoying the programmer-friendly and secure meta-languages you can build in less than an hour:
Embeddable Weird Machines
Now that we established that language implementations are our best friends providing us with power and structure, how about making things Turing-complete by design? I mean it.
You should allow your users to cange your program by exposing programming languages to them.
This point may sound uneasy to you, because the immediate thought would be "so I have to write a parser, analyzer, compiler and an environment for it..." No, you don't have to! Remember the previous point: use the tools provided by your programming language and expose those to your users.
Here, by exposing programming languages, I don't necessarily mean making users write code. I rather mean exposing a Turing-complete way to customize your system. Be it scratch-like code blocks, Notion Relations, or HTML templates from MySpace—all of them are united by exposing an (almost) Turing-complete language to their users and to allow users shape their experience in the allocated boundaries and compiled to the programming language-ish form.
Again, example from Nyxt. When I've stumbled across the problem of managing per-website user configuration, my initial idea was inspired by ad-blocking host lists: simply store a match pattern for a website, and store all the settings associated with it on the same line:
Then, when I implemented the first prototype of this feature
(called auto-mode and destined to be renamed to auto-rules
in Nyxt 3.* after a major refactoring we've done),
I realized that the (match-domain "paulgraham.com")
is actually too close of a resemblance to a function call.
Why not implement it as a function call
required to return a boolean value,
and allow users to write arbitrary code in there?
It was a win-win decision both for extensibility and security, because
- Using raw function calls allows to arbitrarily fine-grain the website settings, even to the point we couldn't have had anticipated it in Nyxt core.
- I've avoided writing incomplete parsers for an arbitrary rule syntax, avoiding the dangers those pose.
- The list of rules is stored on user's filesystem and is only likely to do harm when one willingly modifies it to do harm. In a sense, we're relying on the OS mechanisms, instead of re-implementing those ourselves. Now, given the restriction of the thing to the filesystem,
- The condition-then-modes structure of these rules is restrictive enough to make this feature both easy to learn and hard to exploit.
With this design decisions, one can now write rules like:
This overpowered feature is used by a reasonable fraction of the community and there were mentions and suggestions on how to improve it further on our forum, which may be the best marker of success for such a niche feature.
Auto-mode, being the restricted, yet still quite a turing-complete programming sub-system that Nyxt has,is not a vulnerability. It's consciously designed as a restricted (in this case—by the filesystem and the rule structure) meta-programming tool. If you consciously design a feature to be turing-complete, you make yourself aware of the abilities it has and restrict them.
Designing for Exploitation, you avoid discovering tool turing-completeness being more that you anticipated, after thousands of your users are already pwned...
Takeaways
If you only have one thing to take away from this post, take this one: using meta-programming tools for everyday tasks makes these tasks both easier to complete and more secure, because of the psychological and technical restrictions one puts on the code and themselves when being aware of and using a sufficiently good meta-programming system.
Designing for Exploitation, you stay aware of the huge security risks that accidental Turing-completeness possesses. Designing for Exploitation, you shape your software in such a way as to leverage the most powerful yet domain-specific parts of your programming language.
You don't simply put eval()
everywhere—you
also wrap it in *read-eval*
to only allow literal values to be parsed.
Or, if it's not only literal values,
you only restrict the effects of this pet Turing machine to a single file,
website, window.
This is the thing that unites my suggestions with Pedro Castilho's theses cited above:
- By using the meta-programming facilities your programming language has, you're both using and restricting it's most exploitable part—the meta-programming facilities themselves.
- By allowing the users to have a programming language as their input/configuration, you save yourself from the realization that their input already is a sloppily evaluated programming language!
Now that you understand how embracing the meta-programming can actually make your code safer, go out there and make some software that's Designed for Exploitation;)
Acknowledgements
This post wouldn't be as good as it (arguably) is without the help of
- Pierre Neidhardt, who helped me polish the phrasing and pointed at some of my assertions about Lisp that were somewhat exaggerated. And do watch his talk on GambiConf too, it a nice showcase of what we've done in Nyxt :)
- Vasily Gerasimov, who helped me realize that I actually need the paragraph about fighting human laziness with proper mindsets and tools, instead of administrative imperatives.
- Milana Faizulina, who've taken a non-programmer stance, and whose questions made this post (supposedly) more understandable to a more general public.
I am the only person to blame for mistakes and inconsistencies you may find in this post, though. I hope there are none left, but one can only hope :)