Making Sense of Lambda Calculus 5: Bring Computation to (Aggregate) Data
By Artyom Bologov
So we’ve covered
numbers as compositions and
booleans as branches in this series of posts.
But both of these are “primitive” data types.
We need aggregate types to build the whole world out of!
This post goes over techniques to build data types in Lambda Calculus.
And some conventional types too.
Note on syntax
Starting with this post, I’ll use syntax of Lamber, my Lambda Calculus-based programming language. I find it more readable than conventional single-letter-single-argument lambdas.
Conses/Tuples/Pairs: All You Need #
An idea that generalizes to all of this post (and beyond:)
Bring computation to data, not the other way around.
It’s a bit of a brain teaser, but let’s see for ourselves.
Conses (or pairs, or tuples) are two-element collections.
With “car” (head) and “cdr” (tail):
def cons fn (car cdr)
(fn (f) f car cdr) .
So cons takes two things: car
and cdr
.
Then it takes a third argument, a function.
It’s curried, so we don’t have to provide this function upfront.
And applies the function to the first two arguments.
That’s what I was talking about.
Make cons user provide their action on data.
Don’t just provide the data to them.
Now the simplest action might be fetching values.
Here are two functions for fetching cons parts.
def car fn (pair)
pair (fn (a b) a) .
def cdr fn (pair)
pair (fn (a b) b) .
See that? We apply the data structure to the accessor function. Not the other way around. It’s counter-intuitive, but useful and generalizes well to other aggregate types.
On NIL
An important thing to all this cons business is NIL. The cons terminator, the final list/tree element. NIL is useful, because it’s also a representation of an empty list. Data structures are useless if you don’t have a way to represent empty states, right?
NIL has two canonical representations with separate handling routines:
def nil false .
def alt-nil (fn x true) .
The former seems to be a consensus and it has a nice property:
it’s 0, False, and NIL, all at the same time.
Consistent and beautiful.
With it, functions for NIL-aware cons
iteration are elegant enough:
just provide two values
- A function to apply to the head and tail if this is a cons;
- And a default value if it’s a NIL.
Null check, for example, is booleans (plus some formalities) applied to the pair:
def null fn (pair)
pair (fn (h t x) false) true .
This idiom notably includes a second argument to the pair.
And (respectively) a third argument to the cons-processing function.
This argument (True) is what NIL branch returns.
But if it’s not NIL, this argument has to go somewhere, and that’s the purpose of x
parameter.
An elegant application of this pattern is foldr
function:
def foldr fn (f init list)
local %foldr = fn (h t z)
f h (t %foldr z)
end
list %foldr init
end
We feed %foldr
, a recursive function, to a cons list, alongside the initial value (a.k.a. init a.k.a. z
).
If the list is NIL, it returns the init.
If it’s not, it’s applying a recursive function to its head and application of it to the tail and init.
Init is passed deep into the list recursion until there’s NIL.
NIL returns this init and then the whole recursion chain collapses into a final value.
Applying the folded function to init and last element.
Then to the result of that and penultimate element.
Then to the result and the element before that.
Et cetera.
To illustrate, here’s a control flow for foldr + 0 [1 2 3]
:
- Apply
[1 2 3]
to fold and 0 - It’s not NIL, so apply recursive function to head and tail
- Apply
+
to 1 and...- Apply
[2 3]
to fold and 0 - It’s not NIL, so apply recursive function to head and tail
- Apply
+
to 2 and...- Apply
[3]
to fold and 0 - It’s not NIL, so apply recursive function to head and tail
- Apply
+
to 3 and...- Apply
[]
/NIL to fold and 0 - It’s NIL, so just return 0
- Apply
- Get 3 + 0 = 3
- Apply
- Get 2 + 3 = 5
- Apply
- Get 1 + 5 = 6
- Done,
foldr + 0 [1 2 3]
is 6, all thanks to the beauty of NIL!
Wait, but what about alt-nil
?!
It’s nice in that it allows short and neat null
check:
def alt-null fn (pair)
pair (fn (h t) false) .
If the cons is NIL, it just returns True.
If it’s non-NIL, it applies the function to head and tail and returns False.
Which is quite elegant...
But that’s about it for its benefits—and e.g. foldr
looks too ugly and explicit with it.
So I prefer the 0/False/Nothing/NIL equality.
All of Lamber list-processing is based on this assumption.
Now, can we generalise all this apply-data-to-function business to something else?
Triples #
Triples are like tuples/conses, but with three elements.
def triple fn (first second third)
(fn (f) f first second third) .
def first fn (triple)
triple (fn (a b c) a) .
;; Et cetera for second and third
If you want to represent an N-element collection, pass these elements to the constructor. And wait for the last argument—the function to apply to these.
Trees #
With these two or three element collections, we can build arbitrarily complex structures.
Cons trees (binary, ternary, or whatever else you come up with.)
That’s what Lisp code is all about—trees of symbols.
cons (cons 1 2)
(cons (cons 3 (cons 4 nil)) (cons 5 (cons 6 7)))
- 1 and 2 (as one pair)
- 3 and 4 (as a so-called proper NIL-terminated list)
- and 5, 6, and 7 (as a chain of conses)
Or, visualizing it with a little piece of ASCII art
(generated by draw-cons-tree)
[o|o]---[o|o]-------[o|o]-[o|o]-7
| | | |
[o|o]-2 [o|o]-[o|/] 5 6
| | |
1 3 4
See the tree there? Not the most readable thing, but the benefit is apparent: It allows creating arbitrarily complex structures. So we only need conses, right? In a sense yes, but there are more tricks and types we can use:
Union Types #
We can increase the number of values we store in the structure, like with triples. But we can also increase the number of actions! That’s how union types actually work in LC. Get multiple functions, one per possible type, and only run the one matching the type.
The most representative union type is Either.
Basically, it’s either Left with one value (usually an error,) or Right with the other value (usually a result.)
def left fn (leftval)
fn (leftfn rightfn)
leftfn leftval .
def right fn (rightval)
fn (leftfn rightfn)
rightfn rightval .
There’s only one value now, but two action functions!
If the thing is right, call rightfn
function, otherwise call leftfn
.
Now to actual functions processing these:
def isleft fn (either)
either (fn _ true) (fn _ false) .
def leftval fn (either)
either identity (fn _ nil) .
The pattern repeats: we pass actions to data, not the other way around.
But this time, there are multiple actions we pass.
In this case, checking that the data is Left means passing a True-returning function as leftfn
.
And getting the left value is basically an identity leftfn
to return whatever is stored there.
Corresponding rightfn
-s are obvious: return False/NIL when Right.
Again, this example can be generalized to an arbitrary number of action functions. That’s what Mogensen–Scott encoding is about. Store the data, get action functions, run the matching one, get the result.
Directions to More Complex Structures #
Now I’m too stupid for more complex data types. Luckily, Marvin Borner isn’t. So go read his “Tiny, Untyped Monads” instead of a conclusion to this post. It’s worth the read! I’ll get to that too, and maybe I can steal some more of it into Lamber, my Lambda Calculus-compiling language.