Wednesday, January 19, 2022

Botched Argument, Botched Algorithm

 "Everybody can know but the point is to understand." That's supposedly a quote by Einstein.

I've been discussing the previous post somewhat and been pointed at regular trees and how OCaml does it.

My point is, and I don't know whether it's true either, when you think your type system gives an inductive proof but it is actually giving a botched bisimulation argument, then that leads to botched algorithms and then also to incomprehensible errors. Because you fundamentally misunderstood the solution to the problem, or you didn't even get the problem right.

Indicative of that could be stuff like this: https://github.com/ocaml/ocaml/issues/7726

Then there was a comment on that I didn't propose a solution out of the conundrum. I don't have one but my instinct is to make the bisimulation relation explicit in the typing rules; i.e., the typing rules should include a bisimulation relation Eq: T x T. Because imperative type checkers prove types equal with a proof of existence of a bisimulation relation; I also suspect you cannot do it any other way without getting close but fundamentally getting it wrong.

But I don't know but that's what I want to experiment with. Once I get the egel interpreter sources accepted and compiled again.

Monday, January 17, 2022

Hindley-Milner Inference is Not-That-Good (Kind-Of)

 Egel doesn't have a type checker,  the two major reasons for that was that I a) wanted to explore the operational model and types are overhead on that endeavour, and b) type systems are hard to get right and easy to get wrong. Of course, I am also thinking about what will come after Egel, and that'll be a typed functional language bootstrapped with the interpreter.

But there's a separate reason I didn't want to immediately dive into a typed language, and that is that I find Hindley-Milner based type inferers lacking for programming languages. (If there's such a thing, you can tweak everything into something like a HM-type system description.)


The reason for that is that HM influences towards a particular style of reasoning. Reading T-Lam, from top to bottom, you can see it as an algorithmic description of: when I know this and that entails that then I can conclude that .. That's bottom-up reasoning, often touted as a major accomplishment. Clear, concise, syntax-driven, and all that. Also, good for teaching. In essence, it is the formula for creating an inductive proof.

However, for those who have studied typing of imperative languages ostensibly derived from Algol, you'll notice that that isn't exactly how their type algorithms work. Principally, in an imperative language you usually reason along with what you know: while proving the equivalence of two types you may encounter (other) type equivalences and postulate those as knowns, and -in the end- use those postulates to finish the argument. That's a bisimulation proof.

However, HM-inferencers can be abused to construct bisimulation proofs once you declare the type of a term since it'll happily reason along with user-supplied postulates. But HM style based inferencers often aren't good at creating or deriving bisimilar types per se; which I actually encountered in real-life situations using GHC or Ocaml.

It's awkward to realize that most faculty are teaching students inductive type systems whereas all type algorithms used in industry construct bisimulation proofs. I don't think current HM based inferencers are even that good -though you can tweak everything- for a programming language, and subsequently, I want to implement a functional language with a type system that derives bisimilar types.

Sunday, January 16, 2022

Pending Language Changes

 While I am kicking and screaming to get the egel compiler accepted by a c++ compiler that supports modules, I have a number of pending language changes that I want to implement.

  1. Import declarations should take qualified identifiers instead of strings.  I've been confronted with modules and been looking at other languages and all of them (Java, Python, Haskell, C++) now default to identifiers instead of strings (file paths) for inclusion of other modules.
    I assume in the end people found out that separating the file system from the module system is the way to go. I'll do that too.
  2. Lowercase namespaces.  I once decided on uppercase namespaces since I thought that would distract a bit from the uppercase Prolog-like variables Egel has. But I was wrong, they just look ugly.
  3. Reintroduce the monadic minus.  The monadic minus was problematic and is problematic. At some point, I decided that if I cannot solve the problem then I won't, and removed all the code regarding it. And then it turned out during Advent of Code that I better not had.
  4. Local combinator defintions. I knew it would come to this, but after Advent of Code, I decided that, yes, local combinator definitions introduced with a where are handy and moreover, look pretty.
  5. Quotation. I am not too interested in all the dynamic things you can do with quoted combinators but I'll implement these anyway.
That should be about it for all language changes I want for a 0.2 release.

Wednesday, January 12, 2022

Push C++20 Modules Hard

 A bit of the rationale behind why you really want to push c++20 modules hard, and why I am jumping on the bandwagon early.

In my online discussions, the most notable feature of c++20 modules mentioned is that'll substantially decrease compilation time.  Better performance, that's always nice. But.

Modules allow another, better, way to organize your code.

And despite being a programming language enthusiast which implies that I, like a lot of academics, tend to concentrate on the trivial, that's huge. Software engineering isn't about nullptr, but about developing and maintaining large codebases. Another means to organize code will impact everyone.

And it is no wonder Microsoft is leading this endeavour at the moment. The traditional route is from programming language, to operating system, to application; this will change how OSes are written. Whoever is in front of the curve can gain a potential edge in this manner.

That's why I am early in jumping on the bandwagon in this case. This is actually hard since my M1 makes a substandard development machine. So I'll be buying a windows/Linux machine and returning to whatever environment that can give me modules early.

Monday, January 10, 2022

C++20 migration woes

 I have an extensive list of subjects I can work on in the Egel language.  Among them: performance (it needs to become an order faster),  the introduction of quotation and local combinator definitions, fixing small unfinished corners,  mobile code, and -last but not least- various language experiments around features not easily expressible in typed languages.

But I also want to migrate to C++20 modules and introduce an egel namespace. And I also want to do that before all the above since that should simplify and sanitize the interpreter; modules should allow for a more principled approach to the implementation and I'll be visiting, subsequently cleaning, a lot of code not touched in years. I prefer single-file (complete) modules for a bit of a silly reason: I mostly program in vim and it's just faster/easier to modify declarations not spread over different files.

Egel is intended to be an interpreted language implemented in C++, closely following the C++ standard and not much more (except for Unicode/libicu) and to be easily extendable and embeddable in C++. This already gives problems since C++20 compilers (gcc, clang, msvc) are at various stages of supporting the C++20 module system. This adds to the problems I already have supporting gcc/clang, the interpreter usually only builds on up-to-date systems due to various moving targets, i.e., cmake and libraries.

I am at the start of the process, and I decided to document a number of problems I encountered.

  • Naming. A silly observation but do I go for `.ixx`, `.cppm`, or `.cpp` files? I decided on `.ixx`. Rationale: I don't like 4 letter extensions,  and `.cpp` is too overloaded for my taste. This is weird because I don't support msvc at the moment, I only work with clang since my shift to a Mac M1, and I want to primarily support the venerable gcc.
  • Macros.  I make extensive use of macros. A number of them I have been phasing out (like casts) but there are two types of macros I would prefer to keep.  Assertions (like `PANIC(m)`, abort with a message) and multiple assignment helpers (like `SPLIT_LAMBDA(l, vv, t)`, split a lambda AST object into variables and a term).  Then there are boiler-plate macros (like DYADIC_PREAMBLE which sets up all the boilerplate of a dyadic combinator.) Where do they go? Do I need to include a `macro.hpp` header file in every module?
  • Constants.  A number of constants are defined as macros in the interpreter.  Like `#define LPAREN '('`. I can easily switch over to constexpr but that implies I need to decide on a type, in my case char or UChar32.  I would prefer not to.  It gets worse for string constants: char*, string, or icu::UnicodeString? Sometimes it makes sense to treat something as text instead of a typed expression.  This is a minor detail but switching over to constexpr actually makes the code more tedious to support long term,  instead of the opposite.
  • Inclusion of other libraries.  Where are `<iostream>` and the libicu c++20 module equivalents? Do I include, do I import, if I import, then what to do I import? Where do I find the information, i.e., what website documents where the objects are defined?
  • CMake.  I have no clue how to write cmake modules that support c++20 modules, not even considering in a portable manner.
These are the problems I have now. What I need are migration guides.

Saturday, January 1, 2022

Rewriting the emitter

 The bytecode currently produced by the egel interpreter is pretty brain-dead at the moment.  For a simple expression like "1+1" it will produce code like "a=reduce(1);b=reduce(1);result=reduce(+ a b)". Needs to be fixed.