Egel Language: 2021

Tuesday, December 21, 2021

Compacting GC?

A neat trick is that directed acyclic graphs, the run-time structures egel programs manipulate, do allow for an efficient in-place compacting garbage collector. Which is what I am tempted to write at the moment.

Looks like sectorlisp uses that idea.

Sunday, December 12, 2021

For future reference

cmake -DCMAKE_BUILD_TYPE=PROFILE ..

egel -e "import \"prelude.eg\";; using System;; using List;; foldl (+) 0 (from_to 0 1000000)"

/opt/homebrew/Cellar/llvm@12/12.0.1_1/bin/llvm-profdata merge -sparse default.profraw -o default.profdata

/opt/homebrew/Cellar/llvm@12/12.0.1_1/bin/llvm-profdata show -all-functions default.profraw

Tuesday, December 7, 2021

Testing 1.. 2.. 3..

// memory test

#include <memory>

#include <new>

#include <type_traits>

#include <utility>

// model values = atom int | array value*

static constexpr size_t roundup(size_t x, size_t m)

{

return x % m == 0 ? x : (x + m) / m * m;

};

class Value;

typedef std::shared_ptr<Value> ValuePtr;

class Value {

public:

Value() = default;

virtual ~Value() {

}

};

class Atom: public Value {

public:

Atom(const int v): _value(v) {

}

virtual ~Atom() {

}

static ValuePtr create(const int v) {

return std::shared_ptr<Atom>(new Atom(v));

}

private:

int _value;

};

struct ValueField {

ValuePtr _value;

};

class Array: public Value {

public:

Array() = default;

virtual ~Array() {

while (_size-- > 0)

std::addressof(operator[](_size))->~ValueField();

}

ValueField &operator[](size_t i) {

char *r0 = reinterpret_cast<char *>(this) + roundup(sizeof(*this), alignof(ValueField));

ValueField *t0 = reinterpret_cast<ValueField *>(r0);

return t0[i];

}

ValueField &at(size_t i) {

if (i >= _size)

throw std::out_of_range("duuude!");

return operator[](i);

}

static ValuePtr create(size_t _sizes) {

return std::shared_ptr<Array>(new(_sizes) Array(_sizes));

}

void *operator new(size_t count, size_t _sizes) {

auto z = roundup(count, alignof(ValueField)) + sizeof(ValueField) * _sizes;

return ::operator new(z);

}

void operator delete(void *x) {

::operator delete(x);

}

// not implemented for brevity reasons.

// Also, it would have to handle (if only to throw an exception)

// if someone attempted to move an Array(30) into an Array(20)

Array(Array &&) = delete;

void operator=(Array &&) = delete;

private:

Array(size_t n) {

for (_size = 0; _size < n; ++_size)

new(std::addressof(operator[](_size))) ValueField;

}

size_t _size = 0;

};

int main()

{

Array fixed;

auto flex = Array::create(20);

// (*flex)[1]._value = 42;

}

Thursday, December 2, 2021

Serialization

I am more pondering than writing code at the moment. In order to get to mobile code, I need to be able to serialize, in order to serialize I need some representation independent of the runtime. Because when the serialized graph isn't independent it might corrupt the runtime when it is being loaded among other things.

It looks like I have little choice but to copy almost the complete runtime hierarchy into a serialization format. Which sounds worse than it is, only four primitive types, arrays, then combinators. And then the combinators are the problem. Data is easy, bytecode can be (dis-)assembled, internal objects are always loaded, then the problem is externally defined C++ combinators. Or I do everything on a per-module basis, that's unclear at the moment.

Then there's the question of what goes into the graph. Say I have a bytecode combinator, do I only write the symbol or also the bytecode? No idea yet but I am tempted to have that separate from the graph.

The good part, serialization makes me think hard about how the interpreter is organized. The bad part, now I want to refactor again and get rid of some warts (macros), introduce namespaces, and employ c++20 modules. It's annoying to code against old code.

Also, pretty sure I am going to write the mobile code on top of grpc, which brings other problems since that's a client/server architecture and I want bidirectionality and more general a cloud of nodes which can talk to each other. But ah well, not everything is perfect. Maybe I'll just use it anyway.

Wednesday, December 1, 2021

Slides

I felt like writing so I made some technical slides.

Sunday, November 28, 2021

Hadoop

While I work on the Python bridge code I am kind-of thinking I am getting a bit too distracted from the original goal of the Egel language, which was to facilitate data-center programming.

At the same time, it doesn't seem worthwhile to reinvent the wheel in this. Apache Hadoop has a ton of experience behind it so I'll need to look at whether I can lift that with Egel bindings after the Python bridge. Or maybe Kubernetes integration?

Thursday, November 25, 2021

Futamura Projections

I was discussing Futamura projections and made two observations.

Most of it remains true when you consider that partial evaluation doesn't have a lot to do with it all. Just take an interpreter and input program, combine them, and you already have a 'compiler.' The whole partial evaluation stuff is more the icing on the cake which makes you hope you can end up with an efficient solution.
Regarding the efficient solution, it's likely the case that partial evaluation in the extreme is an undecidable or NP problem. As an intuition, just take a sat solver as an interpreter. Then Futarama predicts that a sat solver plus a problem statement would give an efficient solver for that problem class, and presto, your specializer proved P versus NP. Doesn't mean you cannot try this approach heuristically, but does imply that you have less than no guarantees at arriving at an efficient solution.

My $.02.

Initial Python Bridge Results

Okay, great. After refactoring I have some initial bridge results.

marco@Marcos-Air build % egel ../test/test0.eg

Python initialized

egel (System::none) bridged (System::none) eq (System::true)

egel (System::false) bridged (System::false) eq (System::true)

egel (System::true) bridged (System::true) eq (System::true)

egel (7) bridged (7) eq (System::true)

egel (3.140000000000000) bridged (3.140000104904175) eq (System::false)

egel (a) bridged (a) eq (System::false)

egel (text) bridged (text) eq (System::true)

Python finalized

Thursday, November 4, 2021

UXN

A neat of the grid system, uxn. Originally, Egel was intended at these kind of systems, but the binary is about 1Mb on my system. Alas.

Wednesday, November 3, 2021

Categorical Nomenclature

Corbin@libera is working on a language and made some observations regarding nomenclature. This is a very handy list, I could steal this once.

Sunday, October 31, 2021

Byte Code Player/Recorder/Optimizer

Right, so due to side-effecting code a range of optimizations is out of reach. For example, even beta reduction is unsafe under most circumstances. Like below.

(\f -> f . f) (print 'hello'; (\x -> x + 1)) 0

That gives one side effect strictly right-to-left reduced, but two when 'optimized' by first beta-reducing the term.

HOWEVER,

It should be safe to just observe what bytecode instructions were emitted during the normal evaluation and glue those together.

At some point, I am going to optimize with some kind of bytecode player/recorder.

Static Programming

I thought about dynamic programming, ... and then decided against it. Though I want people (me) to be able to inspect the runtime, I don't want people to be able to modify it (except maybe through loading modules).

Simple reason: when the meaning of a symbol doesn't change that shaves off a lookup over a table, plus you enable optimizations, and that implies way faster code.

Saturday, October 30, 2021

SectorLisp

Yah, I love stuff like this. Sectorlisp, in less than 512 bytes.

Friday, October 29, 2021

Searching is calculating, parsing is searching

I elaborated on the previous small theory and implemented search combinators which I used in a small example to write a parser/evaluator for a simple expression language.

Thursday, October 28, 2021

Calculations

I added a small theory of calculations modulo a state and an example. A calculation consists of chained actions where an action is a function working on a state returning a result and a new state.

The gist is:

## Calculate:return a - calculate further with value a

def return =

[ A S -> (A, S) ]

## Calculate:chain f g - first do f, then g

def chain =

[ F G S -> [(A, S) -> G A S] (F S) ]

## Calculate:run f s - run calculation f on state s

def run =

[ F S -> [(A, S) -> A] (F S) ]

Which are, of course, a monad return and bind with a run function.

The calculation theory and the state game. The state game is straightforward, state and manipulation functions are defined, after which the game loop function results in a chain of actions, which are run on the state.

Tuesday, October 26, 2021

Pre-release v0.1.1: hotfixed critical regression

So, a critical regression made the interpreter stall on large input. I followed a hunch that since the lexer became LL(n) instead of LL(1) the UnicodeString's charAt32 method kept resetting an internal pointer resulting in quadratic behavior. The fix worked.

Back at 50KLoc/sec parses on a synthetic example. It warranted a pre-release of v0.1.1 because this was a show-stopper for me.

Ah well. Neat, I guess.

FreeBSD 13.0 compile and install

The Egel interpreter now works on FreeBSD though you need to set the include environment variable. Took me a day to get FreeBSD+dwm running under Qemu on my Macbook Air M1, but felt worth it.

Saturday, October 23, 2021

To do: Time & Date

So, time and date functions for Egel. Probably these should come in the form of built-in primitives since you want to be able to natively parse and format them. Following the rule: lift C++ to Egel I am looking at that but find it hard, there's some kind of impedance mismatch going on I find hard to place.

There's just too much going on. Conceptually, C++17 onwards has clocks, time points, and durations. The time points and durations are tied to any of the (five or something) clocks. There's also time_t which is the old unix epoch time I estimate - I gathered I would drop this. Then time points and durations can be expressed in types varying between years to nanoseconds.

A bit hard to disentangle and match to combinators.

Wednesday, October 20, 2021

Optimal Reduction. Levy, you serious?

So. Egel is a term rewriter so I ended up discussing Levy-optimal reduction. I never trusted that result.

Optimal probably involves never performing the same reduction during the lifetime of a program.

Two intuitions:

Likely there's a result akin to halting that deciding whether two similar reductions might occur during the lifetime of a program is undecidable.
Optimal sharing probably can be viewed, or stated as, as an optimal search problem thus then no optimal reduction can exist unless P=NP; i.e., optimal reduction would be optimal search would be a polynomial reduction strategy for NP problems.

Something like that. Old thoughts in the back of my mind.

Tuesday, September 28, 2021

Built-in combinators of the Egel interpreter

I fixed a quick and dirty script to generate a list of built-in combinators. The script and the list of combinators.

Monday, September 27, 2021

FSF CFarm results

FSF graciously granted me access to their compile farm. So far, the Egel interpreter doesn't compile on a lot of machines because it relies on a lot of modern software.

It needs cmake (3.13), libicu (65.0), and libfmt (8.0) and those are not installed on the majority of machines.

I also get wildy varying results back on the performance of the microbenchmark million.eg (summation of a list of the first million integers). Below, the results:

gcc185.fsffrance.org (centos, aarch64) - succesful compile and local install. 22 sec on million

gcc202.fsffrance.org (debian, sparc64) - succesful compile and local install. 1.22 min on million

gcc203.fsffrance.org (debian, powerpc64) - succesful compile and local install. 25 sec on million

I'll try a number of other machines (some I am interested in seem to be down). Next steps are to create rpm and deb packages.

Thursday, September 9, 2021

Dynamic dispatch in Egel

An example how to do dynamic dispatch in Egel. This is a cool language.

# Just a showcase how one could do dynamic dispatch in Egel over a table.
#
# This is of course an extremely slow implementation over a lookup table
# represented as a list. But moving the combinators to C++ would help.

import "prelude.eg"

using System
using List

namespace Dispatch (

    val dispatch_table = ref {}

    def dispatch_register =
        [ TAG FUNC ->
          let TABLE = get_ref dispatch_table in
          let TABLE = cons (TAG FUNC) TABLE in
          set_ref dispatch_table TABLE ]

    def dispatch_findrec =
        [ TAG nil -> throw (format "dispatch for {} failed" TAG)
        | TAG0 (cons (TAG1 FUNC) TABLE) ->
	    if TAG0 == TAG1 then FUNC [ _ -> dispatch_findrec TAG0 TABLE ]
	    else dispatch_findrec TAG0 TABLE ]

    def dispatch_on =
	[ TAG -> let TABLE = get_ref dispatch_table in 
                     dispatch_findrec TAG TABLE ]

)

def ++ = Dispatch:dispatch_on "++"

val plus_float_register =
    let FUNC = 
        [ K EXPR0 EXPR1 -> if [ E::float -> true | E -> false ] EXPR0 then (+) EXPR0 EXPR1 else K nop EXPR0 EXPR1
	| K -> K nop ]
    in
    Dispatch:dispatch_register "++" FUNC

val plus_text_register =
    let FUNC = 
        [ K EXPR0 EXPR1 -> if [ E::text -> true | E -> false ] EXPR0 then (+) EXPR0 EXPR1 else K nop EXPR0 EXPR1
	| K -> K nop ]
    in
    Dispatch:dispatch_register "++" FUNC

def main = 
    print (1.0 ++ 2.0) "\n";
    print ("hello " ++ "world") "\n";
    print (1 ++ "failure")

Wednesday, September 8, 2021

Egel interpreter runs twice as fast in VM

Okay, so I ported the interpreter to Fedora 34 (arm64) in a virtual machine on the Air. Now the microbenchmark runs twice as fast than on native, 2.5 seconds.

I thought I might have messed up and the thing doesn't garbage collect, so tested that and:

[marco@fedora debug]$ valgrind ./egel ../examples/million.eg

==2653== Memcheck, a memory error detector

==2653== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info

==2653== Command: ./egel ../examples/million.eg

==2653==

500000500000

==2653==

==2653== HEAP SUMMARY:

==2653== in use at exit: 3,374 bytes in 4 blocks

==2653== total heap usage: 40,051,242 allocs, 40,051,238 frees, 2,275,766,965 bytes allocated

==2653==

==2653== LEAK SUMMARY:

==2653== definitely lost: 0 bytes in 0 blocks

==2653== indirectly lost: 0 bytes in 0 blocks

==2653== possibly lost: 0 bytes in 0 blocks

==2653== still reachable: 3,374 bytes in 4 blocks

==2653== suppressed: 0 bytes in 0 blocks

==2653== Rerun with --leak-check=full to see details of leaked memory

==2653==

==2653== For lists of detected and suppressed errors, rerun with: -s

==2653== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

So that isn't it. Maybe the VM thinks it's a uniprocessor.

Tuesday, September 7, 2021

Moved to MacOS (arm64). Egel interpreter now twice as fast

The microbenchmark 'million.eg' which computes the sum of the first million numbers ran in 16 seconds on my old intel laptop. It's just a fold: foldl (+) 0 (fromto 0 1000000)

Observe:

egel % time egel examples/million.eg

500000500000

egel examples/million.eg 6.94s user 0.10s system 99% cpu 7.061 total