Egel Language

Tuesday, February 6, 2024

E-Graphs trivially in PSPACE

Some people pointed me at E-graphs, a manner for representing and then heuristically minimizing terms. Of course, Boolean circuits are terms too and minimizing circuits is PSPACE. Very much so, even.

So, it's unclear how much minimization you can actually achieve with E-graphs.

Friday, February 2, 2024

Control Flow Revisited

Two forms (b?f:g)(x) and f(b?x:y), should be enough for all (functional) control flow analysis. In a very abstract sense, application and choice are all you need.

After I commented that contification just equals hoisting on a small example someone on the interwebs made a smart comment that 'f' cannot be hoisted in the following example but can be contified.

let f = fun arg -> body in

match thingy with

| A -> f x

| B -> f y

| C -> g foo

Of course, you just need a rewrite to a term that uses b0? f(b1?x:y) : g foo to see that 'f' is still hoisted. But it won't be pretty so that's an issue.

Thursday, February 1, 2024

I think somewhere between '01 and '04 I was interested in SAT problems. At that point in time there was an unoutspoken vibrance in the field that NP problems might be tackled.

At that time I concentrated on NAND, or Sheffer stroke, rewriting. I figured if people were close to breaking some barrier then there should be some rewrite strategy to a solution that didn't suffer from exponential blowup during the path.

NAND terms are beautiful little mathematical expressions and the rewrites I came up with had pleasant forms so I wrote several small programs.

One of the things I stumbled upon was a 3SAT formula after I looked for a while at the truth table for NAND.

x y z
0 0 1
0 1 1
1 0 1
1 1 0

Every row has at least one zero (-x+-y+-z), and certain pairs have at least one one (x+z) and (y+z).

I checked the conjunction and realized I had a linear translation from circuits to 3SAT.

Meanwhile, I was reading satlive.org and noticed that people were unaware of that and posted the solution there thinking Armin Biere would at least pick it up.

Years later I learned I had rediscovered a solution from Tseitin from the sixties or something, years before SAT solvers became relevant. Nowadays I think everyone who studies the problem a bit would've come up with it.

The NAND rewriting strategy never panned out, stuff kept exploding. Moreover, rewriting is extremely slow in comparison to cheaply manipulating arrays in what is very fast (learning) local search.

I can still somewhat rewrite NAND terms in my head or play pebble games on them, but that doesn't seem to be a marketable skill.

But, ah well, I comfort myself that I was in the good company of Knuth who, for at least a while, thought that P=NP.

Tuesday, January 30, 2024

All the control flow you'll ever need

(b?f:g)(x)

Tongue in cheek, of course. Control flow corresponds to determining what (part of a) program you will run next and, of course, you can always compile down to lambda calculus in an abstract sense.

Monday, January 29, 2024

Why compiling with continuations is wrong(-ish)

There are two axes to plot programming languages, eager/lazy and strict/permissive. Eager/lazy are two evaluation models, strict/permissive determines how much a language sticks to the chosen model.

Most languages are very eager, usually except for short-circuited boolean operators, and reasonably strict, the order of evaluation is mostly known. Non-strictness is often exploited by compiler writers to reorder expressions for performance; i.e., one might not know precisely in what order '2*x+3*y' is evaluated but it will be fast.

Lazy languages like Haskell and Clean, are mostly lazy and permissive, entire program fragments might be reordered or made eager due to purity. All for performance reasons.

Order of evaluation is of course important, all programmers need to deal with it - albeit less in pure languages. Compiler writers need to deal with it too, a popular manner of writing a Scheme compiler is to transform to continuation passing style since that will make the order of evaluation explicit but also allows for control flow operators.

Let's dive into that a bit with a small snippet of code.

pyth x y = sqrt (add (pow2 x) (pow2 y))

After transformation to continuation passing style, one ends up with:

pyth' x y cont = pow2' x (\x2 -> pow2' y (\y2 -> add' x2 y2

(\anb -> sqrt' anb cont)))

This transformation is of course global, all functions get a continuation object as an extra argument, and all functions are defined as complex expressions that pass along continuations. It's also a pretty bad manner of writing a program if it weren't for that this style serves well as an intermediate representation for compilation to a backend since it makes order of evaluation and temporary values explicit. Fantastic, done!

The Egel interpreter internally also uses an intermediate representation prior to outputting to bytecode. Let's look at that.

pyth x y =

let y2 = pow2 y in

let x2 = pow2 x in

let anb = add x2 y2 in sqrt anb

The intermediate representation hoists reducible expressions, redexes, and reducible expressions only, out of the term and binds the results, often called reduxes. And similar to the continuation passing style or administrative normal form, makes the order of evaluation and use of temporary values explicit.

But only redexes -i.e. work- are hoisted out. And control flow is nice, but it was, and it will always be, mostly about communicating clearly to your backend what work needs to be done - in what order is the detail.

And that's where I think continuation passing style, or administrative normal form, went wrong-ish as a backend solution for rewriters; they are answers that will function but also the almost correct but strangely wrong answers to a trivial question: what work needs to be done. And instead of answering that they concentrated on the order or got the work statement wrong. No wonder that naive continuation passing style will state too much to be done. Meanwhile, this intermediate representation also gives a precise answer to what functions consume constant stack space and can be optimized further, i.e., those are the functions that rewrite to exactly one redex.

TL;DR: When developing a rewriter it becomes important what work needs to be done in what order. And hoisting out redexes makes that clear, whereas continuation passing style and administrative normal form give almost correct but wonkish answers.

Saturday, January 27, 2024

Are All Algol-like Calling Conventions Wrong?

Those who have watched the critically acclaimed "Egel - The Movie" (8 upvotes on lobste.rs) know that I've been dabbling with a small rewriter with a trivialized semantics. The observant viewer might have noticed an off-the-cuff remark in that movie that Egel, since it rewrites, has less need for tail-call optimization, some stuff comes for free.

Let's revisit a small recursive function, factorial: fac 1 = 1 | fac n = n * fac (n-1).

Those who know some compiler theory will know the runtime behaviour of that function if implemented according to the Algol convention for stack machines: for every invocation, a stack frame is created with a dynamic and static link and some space for local variables. The program will subsequently perform its operations within the context of that frame, and for each recursive call, a new stackframe is pushed to the stack.

Egel in contrast, will rewrite a redex to another number of redexes. For a stack machine that corresponds to removing the callee's stack frame entirely and pushing several stack frames with work to be done to the stack. Hence my remark that there's less need for tail-call optimization, for simple recursive functions that consume constant stack space you get that for free. Also noteworthy, tail-call optimization module cons comes for free too.

And that begs the question of whether other calling conventions are wrong, at least for ML-like languages.