Designing programming languages is hard.\ But does it have to be this way? assets/cutting-corners.png Soft palette thumbnail.\ In the center, 'Language' is written with huge bright Greek Lambda instead of 'L'.\ The (handwritten) font is all wiggly and curly.\ In the corners, attributions to aartaka.me and Artyom Bologov (with highlighted pretty capitals) are added. IMAGE_ALT

So I'm designing a programming language. The Programming Language, rather. But a big problem with designing a PL is that... it's hard. Or, rather, everyone thinks and wants it to be hard.

But do I need all of these, actually? What if I took a much lazier approach and decided to cut some corners? Let's do exactly that!

Cutting on Parsing: 5 Forms

Most of university courses on PL design start and end at parsing. (Except for the ones using Lisps, but these are an exception—everyone else must suffer.) Parsing is hard, especially so—for C-family languages. Making proper parsers takes time, often a full quarter/semester. I need to simplify that. What if there were only 5 special syntax pieces to parse?

(Re)define a variable. Aliases: , , , . Because you only need one kind of variable definition—lexical binding.
Conditionals with . is just two / merged. -terminated.
Creating a new function (lambda). Aliases: and . Add it up with to define new (maybe ) functions and you have named abstraction. -terminated too.
Values and function calls
If there are args, it's a function call. If not, it's a value.

The rest of the programming language bloat can be reduced to that. Switch/case statements? Just use ! Loops? Named recursion and functional primitives! (With lambdas and fixed-point combinators, of course.) Pattern matching? You don't need it, as much as you don't need parametric polimorphism. (Whatever that means.) The list goes on.

Functions, definitions, conditions, values, and applications. We've gone full circle to procedural programming. (With lambdas and recursion, though.) Because it's simpler this way. And we're fully able to make things complicated even without classes or types.

Cutting on Function/Value: Functions Always Have Args

How do you decide when something is a function call and when something is just a value? Most languages just decide to add parentheses, but that's too hard to parse. What if I just said... that functions are whatever has arguments?

function value value (function value)
value

"But what if I need zero arguments function?" You don't. To think of it, zero argument functions are either constants in disguise. Or side effects. You don't need either. So it's safe to imply that function always have args.

Cutting on Operator Precedence: (Reverse?) Polish Notation

Here's a small warmup on your C-like language precedence knowledge. What's the precedence in these?

a + b * c
a * b ^ c
a & b | c
a && b || c
a = b || c
a = b, c
a == b || c
a-- -b + c, a
a = b() || c()

Why need precedence at all, though? I can just use prefix/suffix notation and get rid of all these tricky rules. Let's do that:

* n (factorial (-- n))

See? Much better—zero ambiguity! (I'll handle the parentheses fatigue later.)

Cutting on Side Effects: Let-bindings

Side effects like input/output and assignments are what makes academy produce computation models. Because these cases need too many smart hacks to even work, let alone get optimized. So what if... there were no side effects altogether?

Yes, no assignments. Just let-bindings:

let a = function (x) x end
let b = function (x) a x end
b 3

One can go a step further and say that one is not allowed to do anything in between -s. So all the action goes after the chain. Which also makes the parsing easier: the thing is either a binding or a final action + return value.

Now, side effects are not only about assignments. Side effects are also about environment interaction, memory poking, and I/O. What do we do with these? The short answer would be that you don't need them. The pragmatic answer will be that you can encapsulate them in a small wrapper. And keep everything else reliably functional.

Cutting on Nesting Levels: Threading and "Tail-heavy" Calls

Now this combination of prefix notation and parentheses looks fishy. Like... Lisp? Lisp is ugly, right? I need to do something about it.

Clojure does threading macros (-> and the like) to get rid of nesting. Scheme has its own and macros. Common Lisp has nothing, and that sucks. What will our language have?

My invention: "Tail-heavy functions". Always put the data to be threaded in as the last argument. With that, I can come up with a way to compose things. Like the hypothetical colon syntax (shamelessly stolen from Wisp:)

* n : factorial : -- n

See? Much less intimidating! And requires just one more keyword to the language. No need for complex macros, just build your functions as "tail-heavy."

Cutting on Computation Model: Lambda Calculus

With side effects out of the way, let's get to the computation model.

You don't need Turing Machine, actually. State-storing tape and movement instructions are only useful if one implements a side-effect-ful systems. There's Lambda Calculus as the more elegant system without side effects.

The problem of loops is elegantly solved with a single function. Infamous Y combinator (or its applicative version, Z combinator:)

def Y = function (g)
  local inner = function (x) g x x end
  inner inner
end

def Z = function (f)
  local inner = function (x) f (fn y : x x y) end
  inner inner
end

So let's just compile our language to raw lambdas! Also relying on...

Cutting on Runtime: Using a Host Language

Host language. Whatever you implement your PL in. I'm implementing mine in Common Lisp, so I'm going to use as much of CL as possible. Instrumenting CL built-in parser/reader for the simple syntax I have. Compiling to Lisp-native lambdas. With typed literals coming straight from Lisp.

Cutting on Type System: JSON

We all need a simple type system. Something lightweight and simple. Something everyone knows. Like JSON!

You can tell I have something else in mind. What if that was simplified too? Say,

Simple for both human and machine, yet useful. But yeah, I've strayed away from JSON somewhat. Because JSON is hard too: it has floats and named struct fields!

Cutting on Type Inference: Nominal+Structural Types

Now that there's talk about types, does one need them in the language? Do we need type inference really? Not really.

But let's say we need types. What then? What kind of type inference would that be?

I'd say optional gradual type inference with nominal types. Or, as we say here, "just call it a string lol." One doesn't need type enforcement or inferences, and can safely ignore type hints. If the programmer is writing type hints, it's their problem. But if one needs to check types, they just use these hints as The Types. Whatever you call it, it is.

def int = type end
var sum = + (int false) (int nil) end

def bool = type end
var truth = or (bool 0) true end

def str = type end
var chr = nth 1 (str ['a', 'b', 'c']) end

(Notice the introduction of . A sixth entity/keyword in our language, but a totally optional one. It can be safely replaced with almost everywhere you see it and nothing will change. Type tags are then just functions when not used as types.)

A more interesting problem: how to infer record types? (Including conses/lists.) The system I came up is too simple to be useful. But hear me out: Constructors and accessors!

Constructors (and only them) create a typed complex thing
e.g. creates a pair.
Accessors are constructor arguments that return respectively typed values
for , the definition might be with and as accessors. For , returns an , and returns a .

This (structural typing) system allows building arbitrarily complex data structures. And that—added to arbitrarily shapeable base language—is more than enough for practical programming.

Cutting on Packaging: File System

Now that we have a nice little language, we need to think about its ecosystem. How does one distribute scripts or libraries? Of course, as plaintext files—everything else is bloated. But how do we build and resolve dependencies for it? Let's use filesystem for that!

One of the packaging systems might be:

Our program can then just be something like:

0500-bool.lmb
...
1500-cons.lmb
2000-int.lmb
3000-char.lmb
...
5500-string.lmb
...
7000-record.lmb
8000-io.lmb
...
your-script.lmb

It's easy to package and distribute our libraries and script now. The only challenge left is the organizational one: how do I make people follow the convention and use this language? But that's a topic for another post.

All the Corners Are Cut: Lamber!

This journey of lazy PL design comes to an end. I've ended up with functional-nominally-optionally-typed-whitespace-insensitive-reverse-polish-notation-language. Let's call it Lamber and give it some stars and use!

P.S. I have to acknowledge inspiration taken from Lua and another less known project: Wisp (SRFI-119). Most of the syntax ideas come from Arne Babenhauserheide's work, including tail-heavy colons and form-ending periods (not featured in the post, but present in Lamber.) So thanks to him for all these ideas!