Egel Language: 2022

Monday, December 5, 2022

Do-notation for chains of transformations

A particular idiom kept appearing in Egel since I started using `|>` to form chains of transformations, I kept introducing an abstraction to set the chain up with an unknown initial argument.

I pondered on it for a while and introduced `do` syntactic sugar into Egel. The semantics of `(do f |> g |> h) x` is `x |> f |> g |> h` for example. That allows one to abstract from superfluous variables.

It works. Below, an example taken from Advent of Code '22.

# Advent of Code (AoC) - day 5, task 2

import "prelude.eg"
import "os.ego"
import "regex.ego"

using System
using OS
using List

def input =
    let L = read_line stdin in if eof stdin then {} else {L | input}

val digits = Regex::compile "[0-9]+"

def parse_crates = 
    do map (do unpack |> chunks 4 |> map (nth 1)) 
    |> transpose |> map (filter ((/=) ' '))
def parse_moves = 
    map (do Regex::matches digits |> map to_int 
         |> [{M,F,T} -> (M, F, T)])

def move =
    [(CC,MM) -> foldl 
        [CC (N,F,T) ->
            CC |> insert (T - 1) 
                   (take N (nth (F - 1) CC) ++ nth (T - 1) CC) 
               |> insert (F - 1) 
                   (drop N (nth (F - 1) CC))]
        CC MM ]

def main =
    input |> break ((==) "") 
          |> [(CC,MM) -> (parse_crates (init CC), parse_moves (tail MM))]
          |> move |> map head |> pack

I am not sold on the donation, the abstraction also works as a strong visual reminder that a function is being expressed. But maybe it takes some getting used to. Also, it's a nice pun on Haskell monads and a reference to an old thought of mine that all you should need is function composition to chain actions, and that later became applicatives.

Monday, September 19, 2022

That billion dollar mistake. In Haskell...?

This is a short observation on 'the billion-dollar mistake' of Hoare, implementing null pointers in Algol W back in 1965. Dereferencing a null pointer usually causes a runtime error immediately terminating the program, and lots of programs crashed due to that.

This mistake is used ad nauseam to plead for safer languages, which made sense at the time since crashing programs had become the default. What is often not told is that runtime exceptions are standard and must be carefully handled in almost all languages. Let's take Haskell, one of the ostensibly claimed safest languages in the world.

Runtime exceptions can occur due to a variety of reasons, applying a partial function outside its domain is one of them. Let's try 'head []' in Haskell.

λ 
> head []
No instance for (Show a0)
arising from a use of ‘show_M340108553800667339831401’
The type variable ‘a0’ is ambiguous
Note: there are several potential instances:
instance Show a => Show (Const a b)
-- Defined in ‘Control.Applicative’
instance Show a => Show (ZipList a)
-- Defined in ‘Control.Applicative’
instance Show GeneralCategory -- Defined in ‘Data.Char’
...plus 44 others
In the expression:
show_M340108553800667339831401 (let e_1 = head [] in e_1)
In an equation for ‘e_134010855380066733983140134010855380066733983140111’:
e_134010855380066733983140134010855380066733983140111
= show_M340108553800667339831401 (let e_1 = head [] in e_1)
In the expression:
(let
e_134010855380066733983140134010855380066733983140111
= show_M340108553800667339831401 (let ... in e_1)
in e_134010855380066733983140134010855380066733983140111) ::
String_M340108553800667339831401

That didn't go too well, Haskell cannot figure out the particular type instance for an empty list. Okay, let's add an assertion.

λ 
> let x = head ([]::[Int]) in x
*Exception: Prelude.head: empty list

And boom, there you have it. Haskell terminates with a runtime exception. In fact, any Haskell program can have this potential 'bomb' in it.

What, in the abstract, is the difference between 'nullptr.x' and 'head []'? Both terminate the program due to a runtime exception.

I fully agree that Haskell is a safer language than most imperative ones. But that 'billion dollar mistake'? That is in Haskell too.

Saturday, July 23, 2022

Musings on Bell

"Why is a raven like a writing-desk?" So, I've officially joined the ranks of online science cranks with some musings on Bell's inequalities. I am actually pretty much a firm believer in his result, but it was a thought I wanted to expand somewhat further.

I've been right and wrong in the past. I was using a linear transformation from digital circuits to CNF in SAT solving and was pointing out that years before other people noticed that existed, turns out it was a rediscovery of Tseitin mid-sixties; I pointed out that all you need for embedding an impure program into a pure language is a form of composition, so the natural choice for that would be function composition over monads, and Haskell programmers are using applicatives instead. Then a host of small and big views on compiler construction, often as right as they were wrong.

So, the QM thing started off with the notion 'what if superposition is a form of oscillation?' Turns out that doesn't matter much and you need to prove Bell 'wrong' in both cases. A pretty tall order. Especially since this is way out of my field.

But I decided to write it down anyway, maybe it goes somewhere:
https://twitter.com/egel_language/status/1550579321160491013

Thursday, July 14, 2022

That QM entanglement thing

While I slowly change a few lines in the Egel interpreter source code from time to time, I am thinking more about QM these days. For whatever reason. So, my braindead musings in all public light for people to laugh at below.

I couldn't help but think: QM is exactly what you get for describing 'spinning' or 'oscillating' phenomena with probability distributions.

The metaphor I have in my mind: Envision you're on a nice tropical island with a lighthouse. The light the lighthouse casts on the island is a spinning phenomenon. 50% of the time it faces you, 50% of the time it doesn't. The 'state', or rather 'behaviour', of that lighthouse can be described with a ket, and letting your hermitian loose on it will confirm that it's a 50/50 chance that you'll 'see' the light passing in front of you.

Now suppose you're on an island with two lighthouses, One lighthouse faces you 70% of the time, another 40% of the time. (The analogy with QM breaks here a bit but fix the behaviour of the lighthouses to fit your fantasy or understanding of QM.)

So when you make measurements of one lighthouse you'll know the probability distribution of the other lighthouse, without any 'spooky action at a distance.' I just don't see it.

ADDENDUM: These are indeed musings on Bell's inequalities. I agree with all that. But my idea is: Bell showed that there are no hidden variables, there cannot be a definite state satisfying the inequalities. What he didn't show was that there cannot be an 'oscillating' state (resulting in related probability distributions.)

ADDENDUM2: Dumbing it further down. Consider a fair coin, a ket faithfully describes: this is a system that once observed will have a 50/50 chance of returning heads or tails. Bell says there are no hidden variables in there. True, the system's behaviour, not state, is completely and faithfully described. Because there is no definite state until you flip it.

ADDENDUM3: The basic observation is that the difference between a superposition and an 'oscillating' state isn't that big. But where Bell says, reject local+real, I would say, local+real makes more sense so just accept that what you're studying is 'oscillating'. To make that stick is of course another issue.

Tuesday, June 28, 2022

Superfluous Thoughs

While I am still working on Egel, some Facebook thoughts -I now assume known- I threw through Google translate:

"Little big thoughts." Causality is an old discussion in metaphysics. If you have two related events A and B, and they happen one after the other, then we call event A the 'cause' and event B the 'effect'. There is not much to argue with and it is utterly pointless to think deeply about it because we simply cannot see into the future.

So just a few pointless thoughts again because my brain is taking me down strange paths.

The first thought is that we can never prove that the universe is causal. With every two observations that we make, usually the relationship is described mathematically, you end up with a relation R(A,B) and there is simply nothing in that relation R that says that B is physically determined by A or vice versa.

You can make causality very plausible, however, but that gets hairy fast too. One definition I once learned of causality was precisely a mathematical one.

Let's take the Fibonacci numbers, where each number is the sum of the preceding two numbers:

1, 1, 2, 3, 5, 8, 13, 21, ...

A very well-known series, but also one that we call causal because every number depends on its past. And yet you can also turn that definition on its head.

The problem is exactly that when I have the number '13', I know that the two preceding numbers were '5' and '8', given the rules of this small universe. So what tells me that it is not exactly the other way around with this series and that the past depends on the future?

What you really want somewhere is that you have an event of which you can no longer reconstruct the past. There has to be a loss of information for this to happen.

Occam's razor, that again proves nothing because often misunderstood as 'the simplest explanation is true' when scientists mean 'don't make the model more difficult than you need', then gives you an intuitive explanation that events apparently cannot depend on the future because there's not enough information there to construct them.

And that's exactly what physicists don't see, the most fundamental models can be read in two directions. Forward as well as backward, there is no loss of information. (In addition, I can give a trivial intuition that Quantum Mechanics is precisely an indication of retrocausality, the reverse.)

Well, another one in a series of completely superficial thoughts.

Thursday, March 24, 2022

Mimalloc Concurrent Reference Counting Back-end

Right, so I would really want to write a typed Egel. But I decided I cannot without a better performing back-end, I would just add types to a too slow language. Seems I cannot get rid of rewriting the Egel interpreter for a while yet.

I studied various solutions, hoped for a drop-in concurrent reference counting garbage collector but none exist. So now I am writing an back-end on basis of Daan Leijen's excellent mimalloc which I'll use as the concurrent slab allocator. The code for the moment does seem to write itself, which is excellent.

I can only hope it'll give me the one order increase in performance I need, otherwise, it'll be all for nought. But I'll do some extensive testing on this (since that needs to be done) and that'll include performance metrics.

Thursday, March 17, 2022

Typed Egel

Below, how Egel could look typed. But I need something for software engineering in the large too, objects or modules or something. Mulling over that.

def fib: int -> int = [ 0 -> 1 | 1 -> 1 | | N -> N + fib (N - 1) + fib (N - 2) ]

type list[N] = nil | cons N list[N]

def concat: [N] => list[N] -> list[N] -> list[N] = [ nil YY -> YY | (cons X XX) YY -> cons X (concat XX YY) ] class ord N = def compare: N -> N -> int def less: N -> N -> bool instance ord int = def compare: int -> int -> int = int_compare def less: int -> int -> bool = int_less def sort: [ord N] => list[N] -> list[N] = [ nil -> nil | (cons X XX) -> insert X (sort XX) ] where def insert: [ord N] => N -> list[N] = [ X nil -> cons X nil | X (cons Y YY) -> if X < Y then cons X (cons Y YY) else cons Y (insert X YY) ]

Wednesday, January 19, 2022

Botched Argument, Botched Algorithm

"Everybody can know but the point is to understand." That's supposedly a quote by Einstein.

I've been discussing the previous post somewhat and been pointed at regular trees and how OCaml does it.

My point is, and I don't know whether it's true either, when you think your type system gives an inductive proof but it is actually giving a botched bisimulation argument, then that leads to botched algorithms and then also to incomprehensible errors. Because you fundamentally misunderstood the solution to the problem, or you didn't even get the problem right.

Indicative of that could be stuff like this: https://github.com/ocaml/ocaml/issues/7726

Then there was a comment on that I didn't propose a solution out of the conundrum. I don't have one but my instinct is to make the bisimulation relation explicit in the typing rules; i.e., the typing rules should include a bisimulation relation Eq: T x T. Because imperative type checkers prove types equal with a proof of existence of a bisimulation relation; I also suspect you cannot do it any other way without getting close but fundamentally getting it wrong.

But I don't know but that's what I want to experiment with. Once I get the egel interpreter sources accepted and compiled again.

Monday, January 17, 2022

Hindley-Milner Inference is Not-That-Good (Kind-Of)

Egel doesn't have a type checker, the two major reasons for that was that I a) wanted to explore the operational model and types are overhead on that endeavour, and b) type systems are hard to get right and easy to get wrong. Of course, I am also thinking about what will come after Egel, and that'll be a typed functional language bootstrapped with the interpreter.

But there's a separate reason I didn't want to immediately dive into a typed language, and that is that I find Hindley-Milner based type inferers lacking for programming languages. (If there's such a thing, you can tweak everything into something like a HM-type system description.)

The reason for that is that HM influences towards a particular style of reasoning. Reading T-Lam, from top to bottom, you can see it as an algorithmic description of: when I know this and that entails that then I can conclude that .. That's bottom-up reasoning, often touted as a major accomplishment. Clear, concise, syntax-driven, and all that. Also, good for teaching. In essence, it is the formula for creating an inductive proof.

However, for those who have studied typing of imperative languages ostensibly derived from Algol, you'll notice that that isn't exactly how their type algorithms work. Principally, in an imperative language you usually reason along with what you know: while proving the equivalence of two types you may encounter (other) type equivalences and postulate those as knowns, and -in the end- use those postulates to finish the argument. That's a bisimulation proof.

However, HM-inferencers can be abused to construct bisimulation proofs once you declare the type of a term since it'll happily reason along with user-supplied postulates. But HM style based inferencers often aren't good at creating or deriving bisimilar types per se; which I actually encountered in real-life situations using GHC or Ocaml.

It's awkward to realize that most faculty are teaching students inductive type systems whereas all type algorithms used in industry construct bisimulation proofs. I don't think current HM based inferencers are even that good -though you can tweak everything- for a programming language, and subsequently, I want to implement a functional language with a type system that derives bisimilar types.

Sunday, January 16, 2022

Pending Language Changes

While I am kicking and screaming to get the egel compiler accepted by a c++ compiler that supports modules, I have a number of pending language changes that I want to implement.

Import declarations should take qualified identifiers instead of strings. I've been confronted with modules and been looking at other languages and all of them (Java, Python, Haskell, C++) now default to identifiers instead of strings (file paths) for inclusion of other modules.
I assume in the end people found out that separating the file system from the module system is the way to go. I'll do that too.
Lowercase namespaces. I once decided on uppercase namespaces since I thought that would distract a bit from the uppercase Prolog-like variables Egel has. But I was wrong, they just look ugly.
Reintroduce the monadic minus. The monadic minus was problematic and is problematic. At some point, I decided that if I cannot solve the problem then I won't, and removed all the code regarding it. And then it turned out during Advent of Code that I better not had.
Local combinator defintions. I knew it would come to this, but after Advent of Code, I decided that, yes, local combinator definitions introduced with a where are handy and moreover, look pretty.
Quotation. I am not too interested in all the dynamic things you can do with quoted combinators but I'll implement these anyway.

That should be about it for all language changes I want for a 0.2 release.

Wednesday, January 12, 2022

Push C++20 Modules Hard

A bit of the rationale behind why you really want to push c++20 modules hard, and why I am jumping on the bandwagon early.

In my online discussions, the most notable feature of c++20 modules mentioned is that'll substantially decrease compilation time. Better performance, that's always nice. But.

Modules allow another, better, way to organize your code.

And despite being a programming language enthusiast which implies that I, like a lot of academics, tend to concentrate on the trivial, that's huge. Software engineering isn't about nullptr, but about developing and maintaining large codebases. Another means to organize code will impact everyone.

And it is no wonder Microsoft is leading this endeavour at the moment. The traditional route is from programming language, to operating system, to application; this will change how OSes are written. Whoever is in front of the curve can gain a potential edge in this manner.

That's why I am early in jumping on the bandwagon in this case. This is actually hard since my M1 makes a substandard development machine. So I'll be buying a windows/Linux machine and returning to whatever environment that can give me modules early.

Monday, January 10, 2022

C++20 migration woes

I have an extensive list of subjects I can work on in the Egel language. Among them: performance (it needs to become an order faster), the introduction of quotation and local combinator definitions, fixing small unfinished corners, mobile code, and -last but not least- various language experiments around features not easily expressible in typed languages.

But I also want to migrate to C++20 modules and introduce an egel namespace. And I also want to do that before all the above since that should simplify and sanitize the interpreter; modules should allow for a more principled approach to the implementation and I'll be visiting, subsequently cleaning, a lot of code not touched in years. I prefer single-file (complete) modules for a bit of a silly reason: I mostly program in vim and it's just faster/easier to modify declarations not spread over different files.

Egel is intended to be an interpreted language implemented in C++, closely following the C++ standard and not much more (except for Unicode/libicu) and to be easily extendable and embeddable in C++. This already gives problems since C++20 compilers (gcc, clang, msvc) are at various stages of supporting the C++20 module system. This adds to the problems I already have supporting gcc/clang, the interpreter usually only builds on up-to-date systems due to various moving targets, i.e., cmake and libraries.

I am at the start of the process, and I decided to document a number of problems I encountered.

Naming. A silly observation but do I go for `.ixx`, `.cppm`, or `.cpp` files? I decided on `.ixx`. Rationale: I don't like 4 letter extensions, and `.cpp` is too overloaded for my taste. This is weird because I don't support msvc at the moment, I only work with clang since my shift to a Mac M1, and I want to primarily support the venerable gcc.
Macros. I make extensive use of macros. A number of them I have been phasing out (like casts) but there are two types of macros I would prefer to keep. Assertions (like `PANIC(m)`, abort with a message) and multiple assignment helpers (like `SPLIT_LAMBDA(l, vv, t)`, split a lambda AST object into variables and a term). Then there are boiler-plate macros (like DYADIC_PREAMBLE which sets up all the boilerplate of a dyadic combinator.) Where do they go? Do I need to include a `macro.hpp` header file in every module?
Constants. A number of constants are defined as macros in the interpreter. Like `#define LPAREN '('`. I can easily switch over to constexpr but that implies I need to decide on a type, in my case char or UChar32. I would prefer not to. It gets worse for string constants: char*, string, or icu::UnicodeString? Sometimes it makes sense to treat something as text instead of a typed expression. This is a minor detail but switching over to constexpr actually makes the code more tedious to support long term, instead of the opposite.
Inclusion of other libraries. Where are `<iostream>` and the libicu c++20 module equivalents? Do I include, do I import, if I import, then what to do I import? Where do I find the information, i.e., what website documents where the objects are defined?
CMake. I have no clue how to write cmake modules that support c++20 modules, not even considering in a portable manner.

These are the problems I have now. What I need are migration guides.

Saturday, January 1, 2022

Rewriting the emitter

The bytecode currently produced by the egel interpreter is pretty brain-dead at the moment. For a simple expression like "1+1" it will produce code like "a=reduce(1);b=reduce(1);result=reduce(+ a b)". Needs to be fixed.