Thursday, December 10, 2015

Rift

Rift by Jan Vitek is actually a nice simple interpreter written in C++.

Wednesday, November 25, 2015

That thing about Applicative

That thing about applicative is that I predicted the idea also years ago. Central in the (IO) monad is that you can inject a value, calculate along with it, and never observe what comes out (the latter makes the IO monad safe.)

And I predicted that any algebraic structure that has those qualities is good enough, or equivalent, to a monad. Moreover, that function composition seems to be a neater fit if you want to compose things outright.

So, now some people want to change from monad to applicative. Which is somewhat neater, but also falls into the category of mindless hogwash scientists amuse themselves with.

Ah well. Another prediction done right, I guess.

Friday, November 20, 2015

OS, VM, Language

So, I didn't do a lot because I don't know where to go from here. There's a somewhat clear division where you can place functionality: in OS, VM, or language. And I am not set on what to do whatever, wherever. But underlying that all, is the hardware of course.

Lets look at recent and coming hardware trends:

  1. Expect more cores since we're nearing the end of Moore's law.
  2. Better microprocessor design makes virtualization cheap.
  3. Switches and routers have (more or less under the radar) become orders faster.
  4. Memory is (slowly) catching up.

These trends and the drive towards services which can handle the traffic of enormous numbers of microdevices has brought forth the modern data center. A horizontally scaling piece of hardware created with redundancy over failing components, and with the advances in microprocessors, networking, and memory, that hardware is now more or less configured by software. I.e., the cloud, or infrastructure as a service.

Well here's catch one: I like to think of a data center as just another piece of hardware, so where's the bloody OS?

So I took OpenStack as an example 'OS,' though it's called a platform. And apparently the functionality of such a platform is scripted in the form of Python components which allow you to describe virtual networks connecting virtualized OSes running any number of applications.

Dammit. This is the modern world, of course. You glue vast amounts of functionality from the hardware to application stack together with a high-level scripting language to arrive at an 'OS' which can handle a data center. But I would prefer to bind to C. Because that's how it's done. Well, in the old days. Where's my Plan9? (I guess I should stick to clusters and Spark.)

Then there's the VM for my language. I set on reference counting things to keep the VM simple. And I would like to hot-swap code, though I am abandoning the thought a bit. The simplest manner is to simply garbage collect all definitions after they become garbage after you swapped some new code in. Which is prohibitively expensive under a reference counting scheme.

Moreover, mentally, I didn't find a nice trick yet how to deal with weak references in C++ which occur during a rewrite. Simply put, I don't want to reference count intermediaries, I only want to reference count thunks as they are rewritten. C++'s reference counting scheme has it reverse for some reason. Which is idiotic, since I assume more people have ran into the same problem as I did. It's starting to look like I'ld need to roll out my own reference counting.

Then there's the language. R moved towards OO, for good reason, and I probably, even if it's a mostly pure combinator language, should do the same. Tada, I would arrive at a bad, even more lousily slow, Python implementation.

And, lastly, there's the use of the language. (Future) Support for complex numbers and tensors I somewhat calculated in but will be a lot of work. But more troubling is the support for the ability to specify a DSL inside my language which I started wondering about after reading up on Haxl and Tensorflow.

Haxl is something like a bad DSL for the specification of out-of-order execution of queries, which is probably something I would need to think about. And if I would like to support something like machine learning then, TensorFlow, you'ld like a language which can support not only numerical and symbolic methods, but also things like automatic differentiation. And somehow the library approach as it is implemented in both cases doesn't quite cut it.

So, after some reading, my little language aimed at doing some trivial mathy stuff through sprinkling combinator expression over data centers now blew up to a project you need four PhDs for.

Great.

Tuesday, November 10, 2015

Google Dataflow A Unified Model for Batch and Streaming Data Processing



Not programming. Thinking. Google seems to be way ahead anyway, and as I concluded earlier, it simply seems to make more sense to base any new data center language on Java. Which they did. Above a really good presentation of Google Dataflow, which is probably the leading tech at the moment.

So, at Google, they already have a combinator/dataflow language which analyzes expressions in point-free style and pushes an 'optimized' computation to the back-end.

Haxl starts to look more and more like a one-off tool with out-dated bad technology which can't be lifted. I wouldn't invest in it.

Still. some stuff to be done here. Which is mostly in the back-end it seems. You don't want to bring up thousands of VMs and send jars around. You want them running an then have the ability to hot-swap in some code. A reported, if I got it right, two minutes wait-time for a batch where you're simply mostly waiting for VMs to come alive doesn't cut it in the long run.

Recommendation: Invest in Java VMs and hot-swapping of code.

Thursday, November 5, 2015

Log 110515

So. R, Erlang, Hadoop/Spark. A bit of Haxl. Java/Scala vs. Python. Code migration and hot-code swapping. Sockets. Evaluation trees. Combinator expressions. Module boundaries. Mobile code. Services. Monads and Applicative. Push and pull models. Reactive programming.

I would bet on Scala.

Saturday, October 31, 2015

Log 110115

Read up on R. Okay.. Read the sources of the Hugs interpreter, again. Did some programming. Man, I am slow.

How useful is Hot-Swapping?

So, turns out WhatsApp is build using FreeBSD and Erlang. And one of the discerning qualities of Erlang is that you can hot-swap modules. So that prompted the question: Do I want to be able to support something like this, in principle if not in practice?

I have a construct in the language which I want to compile out but which crosses module boundaries. Now what? If you cannot, or shouldn't, compile it out; you probably need to support it, possibly all the way to the runtime. Which is annoying.

How useful is Erlang hot-swapping anyway?

Friday, October 30, 2015

Log 103015

Well I didn't do a lot today except think. Compiling out try/catch/throw will go down the book as just another silly idea I once had. This looks like it was the last possible show-stopper in the interpreter. There is only work left, and today I certainly didn't feel like that.

I posted an idea on LtU about back-compiling a half-way evaluated expression to a program; that looks to be feasible and interesting. Not sure it is that useful though.

Thursday, October 29, 2015

It is that simple

I am leaning towards: If you can't compile it out (desugar it) trivially, then you probably shouldn't.

It ain't that simple: Exceptional Control Flow

Story short: The current scheme for translating try/catch to a set of combinators is (mostly) correct, but leaks since curried combinators might escape a try/catch block with exception handlers still referring that block. Seems I made a bad call here.

So, I need to rethink this. And since I am now provably 'Not that Smart'™, I'll do it in this blog. First, a number of observations:

There is a translation scheme of try/catch to an 'unsugared' term. (I use the term 'unsugared' loosely for a source code to source code translation.)
This is trivial, any calculation can be translated to SK only if need be.

If we can unsugar try/catch, we should.
It greatly simplifies the runtime if I only need to think about combinator applications. There is the possibility that I simply shouldn't try to unsugar this, to keep the operational model clean, but that's unlikely.

The translation of try/catch to an unsugared term is a 'must have.'
I want to do combinatorial rewriting, mostly to see where I'll end up, but also to experiment with what it enables. If anything, if this is to be a simple slow toy, it should enable certain functionality trivially usually not found in other languages. If all computations are simple terms being rewritten we might enable:
  1. Trivial low overhead concurrency. Concurrency deflates to interleaving evaluation steps amongst any number of terms.
  2. Trivial suspension of computations. A term could, for instance, be stopped, written to disk, and resumed at any given point.
  3. Trivial mobile code. Any term can be serialized, send to another place, and be resumed there.
The last one is a must have. I want to see whether I could create a better R, or Python, for data centre applications, and the ability to move code to where the data is is a must have. In Spark, this is achieved by either serializing Java bytecode for Java/Scala, or pickling Python objects, and sending the abstract calculations to their destinations. I want this.

The unsugared term must allow for efficient evaluation.
Okay, here's the real catch. The most trivial source to source translation is simply having any calculation return a value from a disjoint union; i.e., any computation returns either a normal value or an exceptional result and either proceeds normally or passes on the exception. But the extra checks are wasteful, blow up all terms, and are inefficient in that they also need to 'pass up' exceptional results instead of simply 'tearing down a stack'.

The design space with magical combinators is limited.
Since we don't want the above translation, that warrants the idea that we should introduce 'magical' combinators which can modify the control flow.
In a term rewriting system, modifying control flow boils down to being able to rewrite the term in unexpected manners (and continue evaluation somewhere else).
For instance, a 'magical' combinator could have a value pop up somewhere else in the term, or erase and substitute entire terms in a non-normal fashion and then force evaluation elsewhere.

And we have two rough solutions for these 'magical' combinators: either do it more or less 'purely', by the creation of control flow objects which are passed around and hold information about what part of the term to rewrite, or do it 'impurely' with combinators which through side effects manage and query external tables referencing the term.

Of course, I prefer the first solution but fell flat on my face there. In the case of try/catch, since control flow objects might escape the local context through returning curried applications, those control flow object cannot be that 'pure' as I imagined (this is solvable, however, by making them impure; i.e., the control object is an impure shared stack of contexts) Moreover, passing around control flow objects is again wasteful as it usually passes an extra argument around.

So there you have it. I want 'pure' terms but for performance reasons I should give up on that idea and use side-effecting impure combinators; i.e., all terms are evaluated within a term referencing context modified and queried by 'magical' combinators. Which should imply a simple scheme where I introduce combinators for try/catch and throw which impurely modify, or extract information from, some tables. Tada, this bites again with low overhead concurrency or mobile code.

So, now I wrote it down. Unsure what to do. It ain't that simple.

Wednesday, October 28, 2015

Log 102815

Various. Thinking about a simplistic runtime. Since I won't be compiling but interpreting, the size of various things are now run time variables, not compile time constants. And C++ doesn't have variable sized (array) structs. Doesn't exist, period. Which implies I cannot implement natively reference counted thunks without doubling the amount of allocations it does, one for the thunk, one for the variable sized array of arguments. Bad luck I guess.

I'll need to extend the try/catch transform and strengthen an invariant. Bloody currying... 

Tuesday, October 27, 2015

Log 102715

I implemented the transform which compiles out try/catch and throw blocks. The aim for the interpreter is to become a useful 'toy', very minimalist with an interactive mode for implementing small mathematical scripts.

The interpreter should be implemented excruciatingly simplistic.

The catch/throw compilation step is not entirely trivial and bites with a possible interactive mode. I need to think over how to implement the interactive mode upfront. I am too used to batch compiling things.

Thursday, October 22, 2015

Compiling out Try/Catch/Throw

Below a piece of code I need to understand again. Screw me, not the foggiest.

namespace lambda (

    using system
    using list

    def reset: lambda = lambda_comb "_reset"

    /** The untry transform compiles out throw/catch statements by 
     *  supplying the 'exception handler' as the first argument to all
     *  functions. It also introduces a reset combinator for try/catch
     *  blocks.
     *
     *  - A definition gets one extra argument.
     *    .. F _ (def f = e) = def f = \x -> F x e
     *  - A any combinator is passed the handler.
     *    .. F x (X) = (X x)
     *  - A throw becomes an application of the exception handler to 
     *    the supplied value.
     *    .. F x (throw e) = x (F x e)
     *  - A try/catch block introduces the reset combinator
     *    .. F x (try e0 catch e1) = R (\s. (\y. F y e0) (\z. s (F x e1 z)))
     */
    def untry_transform: lambda_transform lambda =
        abs = [ v, l -> lambda_abs (list.single v) (lambda_alts (list.single l)) ];
        [ ef, lambda_def f l ->
            x = lambda_fresh_name nop;
            (_, l) = untry_transform x l;
            l = abs x l;
            l = lambda_def f l;
                (ef, l)
        | ef, lambda_comb t ->
            l = lambda_app (lambda_comb t) ef;
                (ef, l)
        | ef, lambda_try l0 l1 -> 
            y   = lambda_fresh_name nop;
            s   = lambda_fresh_name nop;
            z   = lambda_fresh_name nop;
            (_, l0) = untry_transform y  l0;
            (_, l1) = untry_transform ef l1;
            l0  = abs y l0;
            l1  = abs z (lambda_app s (lambda_app l1 z));
            l2  = abs s (lambda_app l0 l1);
            l   = lambda_app reset l2;
                (ef, l)
        | ef, lambda_throw l -> 
            (_, l) = untry_transform ef l;
            l = lambda_app ef l;
                (ef, l)
        | ef, l -> 
            lambda_child untry_transform ef l ]

    def untry: lambda -> lambda =
        [ l -> (_, l) = untry_transform (lambda_comb "nop") l; l ]

)

Catanastrophe: The Catagorical Mistake

I'll admit it, I am pleased with myself. I predicted on LtU that Haskell's fixation on category theory would lead nowhere, and some people now seem to agree.

The problem: software engineering concentrates on analyzing and describing rich (algebraic) structures whereas category theory concentrates on the trivial, the superficial, structure of theories. There's nothing there to lift, except for the trivial, which you probably could code in half an hour anyway. So you end up with incomprehensible theories on elaborate types providing all kinds of annotations describing structure and functionality you'll probably never use, where you could code the functionality you need yourself, but are forced to comprehend all the trivial observations anyway.

Of course, Haskell is a research language, a moving target, and I've no problem with being proven wrong somewhere in the future. But for the moment, I'll assume I was right and now other people start to find out and agree on that.

So maybe it's not such a bad idea to have an untyped functional language?

Log 102215

So. Progress is slow but steady. Progress so far: provisional Unicode lexer (doesn't handle all cases), provisional parser (doesn't handle operator precedence right), provisional semantics checker (doesn't check all invariants), basic support for desugaring (I am now implementing all cases).

Desugaring is simple. I am going to desugar guarded rewrites, which will make them very expensive but I don't think it's really worth supporting them at this stage, or ever. The motto will be: You can use guards but please don't if you can avoid them.

I am going to drop LLVM support and will write a rationale about that in the next post.

I am implementing in C++, not a bad choice but I somewhat regret having dropped Java (bytecode) support.

First Post

First post for the Egel Language, an untyped eager scripting language.