Designing the Language by Cutting Corners
By Artyom Bologov
So I'm designing a programming language. The Programming Language, rather. But a big problem with designing a PL is that... it's hard. Or, rather, everyone thinks and wants it to be hard.
- LR parsers, parser combinators, backtracting parsers, nondeterministic and whitespace-(in)sensitive parsers...
- Multi-pass analyzers and optimizers...
- Thousands of IRs and bytecodes: JVM, LLVM IR/Single-assignment form, bytecode interpreters, C transpilers...
- Custom runtimes, dynamic libraries, hot reload, code-on-demand...
But do I need all of these, actually? What if I took a much lazier approach and decided to cut some corners? Let's do exactly that!
Cutting on Parsing: 5 Forms #
Most of university courses on PL design start and end at parsing. (Except for the ones using Lisps, but these are an exception—everyone else must suffer.) Parsing is hard, especially so—for C-family languages. Making proper parsers takes time, often a full quarter/semester. I need to simplify that. What if there were only 5 special syntax pieces to parse?
-
def
- (Re)define a variable. Aliases:
let
,local
,var
,alias
. Because you only need one kind of variable definition—lexical binding. -
if
- Conditionals with
else
.else if
is just twoif
/else
merged.end
-terminated. -
fn
- Creating a new function (lambda). Aliases:
function
andlambda
. Add it up withdef
to define new (maybelocal
) functions and you have named abstraction.end
-terminated too. - Values and function calls
- If there are args, it's a function call. If not, it's a value.
The rest of the programming language bloat can be reduced to that.
Switch/case statements? Just use if
!
Loops? Named recursion and functional primitives!
(With lambdas and fixed-point combinators, of course.)
Pattern matching? You don't need it, as much as you don't need parametric polimorphism.
(Whatever that means.)
The list goes on.
Functions, definitions, conditions, values, and applications. We've gone full circle to procedural programming. (With lambdas and recursion, though.) Because it's simpler this way. And we're fully able to make things complicated even without classes or types.
Cutting on Function/Value: Functions Always Have Args #
How do you decide when something is a function call and when something is just a value?
Most languages just decide to add parentheses, but that's too hard to parse.
What if I just said... that functions are whatever has arguments?
function value value (function value)
value
"But what if I need zero arguments function?" You don't. To think of it, zero argument functions are either constants in disguise. Or side effects. You don't need either. So it's safe to imply that function always have args.
Cutting on Operator Precedence: (Reverse?) Polish Notation #
Here's a small warmup on your C-like language precedence knowledge.
What's the precedence in these?
a + b * c
a * b ^ c
a & b | c
a && b || c
a = b || c
a = b, c
a == b || c
a-- -b + c, a
a = b() || c()
Why need precedence at all, though?
I can just use prefix/suffix notation and get rid of all these tricky rules.
Let's do that:
* n (factorial (-- n))
See? Much better—zero ambiguity! (I'll handle the parentheses fatigue later.)
Cutting on Side Effects: Let-bindings #
Side effects like input/output and assignments are what makes academy produce computation models. Because these cases need too many smart hacks to even work, let alone get optimized. So what if... there were no side effects altogether?
Yes, no assignments.
Just let-bindings:
let a = function (x) x end
let b = function (x) a x end
b 3
One can go a step further and say that one is not allowed to do anything in between let
-s.
So all the action goes after the let
chain.
Which also makes the parsing easier: the thing is either a binding or a final action + return value.
Now, side effects are not only about assignments. Side effects are also about environment interaction, memory poking, and I/O. What do we do with these? The short answer would be that you don't need them. The pragmatic answer will be that you can encapsulate them in a small wrapper. And keep everything else reliably functional.
Cutting on Nesting Levels: Threading and "Tail-heavy" Calls #
Now this combination of prefix notation and parentheses looks fishy. Like... Lisp? Lisp is ugly, right? I need to do something about it.
Clojure does threading macros (->
and the like) to get rid of nesting.
Scheme has its own cut
and chain
macros.
Common Lisp has nothing, and that sucks.
What will our language have?
My invention: "Tail-heavy functions".
Always put the data to be threaded in as the last argument.
With that, I can come up with a way to compose things.
Like the hypothetical colon syntax (shamelessly stolen from Wisp:)
* n : factorial : -- n
See? Much less intimidating! And requires just one more keyword to the language. No need for complex macros, just build your functions as "tail-heavy."
Cutting on Computation Model: Lambda Calculus #
With side effects out of the way, let's get to the computation model.
You don't need Turing Machine, actually. State-storing tape and movement instructions are only useful if one implements a side-effect-ful systems. There's Lambda Calculus as the more elegant system without side effects.
The problem of loops is elegantly solved with a single function.
Infamous Y combinator (or its applicative version, Z combinator:)
def Y = function (g)
local inner = function (x) g x x end
inner inner
end
def Z = function (f)
local inner = function (x) f (fn y : x x y) end
inner inner
end
So let's just compile our language to raw lambdas! Also relying on...
Cutting on Runtime: Using a Host Language #
Host language. Whatever you implement your PL in. I'm implementing mine in Common Lisp, so I'm going to use as much of CL as possible. Instrumenting CL built-in parser/reader for the simple syntax I have. Compiling to Lisp-native lambdas. With typed literals coming straight from Lisp.
Cutting on Type System: JSON #
We all need a simple type system. Something lightweight and simple. Something everyone knows. Like JSON!
- Numbers;
- Booleans;
- Arrays (lists and strings);
- Structures;
- Functions...
You can tell I have something else in mind. What if that was simplified too? Say,
- only have arrays (cons lists, actually), and implement strings as arrays of chars.
- And then have chars to be mere numbers.
- And then only have numbers restricted to integers.
- And then have record types just as mere sequences of values.
- And then have
false
=0
=null
, like in C!
Simple for both human and machine, yet useful. But yeah, I've strayed away from JSON somewhat. Because JSON is hard too: it has floats and named struct fields!
Cutting on Type Inference: Nominal+Structural Types #
Now that there's talk about types, does one need them in the language? Do we need type inference really? Not really.
But let's say we need types. What then? What kind of type inference would that be?
I'd say optional gradual type inference with nominal types.
Or, as we say here, "just call it a string lol."
One doesn't need type enforcement or inferences, and can safely ignore type hints.
If the programmer is writing type hints, it's their problem.
But if one needs to check types, they just use these hints as The Types.
Whatever you call it, it is.
def int = type end
var sum = + (int false) (int nil) end
def bool = type end
var truth = or (bool 0) true end
def str = type end
var chr = nth 1 (str ['a', 'b', 'c']) end
(Notice the introduction of type
.
A sixth entity/keyword in our language, but a totally optional one.
It can be safely replaced with fn
almost everywhere you see it and nothing will change.
Type tags are then just identity
functions when not used as types.)
A more interesting problem: how to infer record types? (Including conses/lists.) The system I came up is too simple to be useful. But hear me out: Constructors and accessors!
- Constructors (and only them) create a typed complex thing
- e.g.
cons 1 2
creates acons int int
pair. - Accessors are constructor arguments that return respectively typed values
- for
cons
, the definition might bedef cons = type (car cdr) ...
withcar
andcdr
as accessors. Forcons int bool
,car
returns anint
, andcdr
returns abool
.
This (structural typing) system allows building arbitrarily complex data structures. And that—added to arbitrarily shapeable base language—is more than enough for practical programming.
Cutting on Packaging: File System #
Now that we have a nice little language, we need to think about its ecosystem. How does one distribute scripts or libraries? Of course, as plaintext files—everything else is bloated. But how do we build and resolve dependencies for it? Let's use filesystem for that!
One of the packaging systems might be:
- Have files in one directory (or introduce
LANG_PATH
variable.) - Order these files lexicographically.
- Glue them together.
- And then just run them.
Our program can then just be something like:
0500-bool.lmb
...
1500-cons.lmb
2000-int.lmb
3000-char.lmb
...
5500-string.lmb
...
7000-record.lmb
8000-io.lmb
...
your-script.lmb
It's easy to package and distribute our libraries and script now. The only challenge left is the organizational one: how do I make people follow the convention and use this language? But that's a topic for another post.
All the Corners Are Cut: Lamber! #
This journey of lazy PL design comes to an end. I've ended up with functional-nominally-optionally-typed-whitespace-insensitive-reverse-polish-notation-language. Let's call it Lamber and give it some stars and use!
P.S. I have to acknowledge inspiration taken from Lua and another less known project: Wisp (SRFI-119). Most of the syntax ideas come from Arne Babenhauserheide's work, including tail-heavy colons and form-ending periods (not featured in the post, but present in Lamber.) So thanks to him for all these ideas!