Saturday, December 24, 2016

Log 122416

You don't want to know how sausages are made. Nor laws. Same goes for my code when I am thoroughly uninterested.
So majestic, so confident. Wow.
Halfway there. I patched up some data structure the C++ compiler accepts and now need to make use of during semantic analysis. The interpreter internally uses a lot of dumb passes, I guess I could 'tidy' up some code there while I am at it and make semantic analysis straightforward two-pass.

Tuesday, December 13, 2016

Rant

Just read another rant from someone who got disappointed about the state of some academic tooling.

Then don't take fricking advice from people who don't, or can't, code.

Meanwhile, I know what to do, probably even how to do it, but don't seem in the mood.

Sunday, December 11, 2016

Log 121116

Just writing some stuff down since I didn't find a scheme yet to implement. So, what the interpreter now does is that it keeps all declarations in a nested map. All combinator definitions live fully qualified in the most outward range/symbol table; local ranges are used to reference renamed imported combinators or for variable declarations. Lookup and insertion are more or less O(1), more precise O(n*log(d)) where n is a small number denoting the nesting depth and d the number of declarations.

But when a namespace is entered or a using directive is encountered the global namespace is searched fully for declarations which match a prefix. I.e., quadratic behavior. In the case of the synthetic example with 6k namespaces and a total of 10k declarations, you end up with 60M lookups. (I knew that when I coded it, just didn't know what price I would pay for it in practice.)

This is, of course, utterly ridiculous. Good enough for a very short first-order approximation, not good enough to publish to Github.

Solution: Nested linked symbol tables. A namespace is nothing more than a named symbol table or range. So, I know there is an O(1) solution^1 where entering a namespace is as simple as adding a reference to a scope. I.e, four orders speedup on the synthetic example. But I guess I would need to implement a lot of nesting and I want it to be neat.

I assume this is also where namespaces originate from. It's just a trivial extension to languages which already have complex scoping rules, thus named symbol tables, like most OO languages.

^1: There is also an O(n) solution where entering a namespace, or encountering a using directive, is equivalent to importing the symbols from another scope. This is somewhat nicer since programmers are then warned for overlapping declarations. But I'll go for the O(1) behavior, thank you. I want it to be fast.

Friday, December 9, 2016

Log 120916

Right. So, after I removed some small bugs it's time to overhaul the environment data structure I use. For one, I realized I wasn't being very smart about semantic analysis and I knew that the way I represented namespaces (basically a flat list of declarations) would give me back quadratic behavior.

I did some testing on a medium sized file (34k lines,  114k words, 608k characters) containing 6k namespace declarations. Profiling indeed gave me back around 30 seconds of wasted time.
Each sample counts as 0.01 seconds.
  %   cumulative   self                 
 time   seconds   seconds    calls     name    
 39.96      5.95     5.95     6000     RewriteIdentify::rewrite_decl_namespace(...
 24.48      9.60     3.65    42009     std::_Rb_tree...::_M_erase(...
 13.77     11.65     2.05     6001     std::_Rb_tree_node...::_M_copy...
 13.40     13.64     2.00 60883594     std::__shared_ptr..
  4.37     14.29     0.65 63833779     std::_Sp_counted_base...::_M_release()
  2.65     14.69     0.40 62664103     std::vector...::vector...

Which, well, won't do, two orders off. Time to refactor this for performance now, which probably means a bit of a dirty solution. I'll take a day or two for this.

uLisp in 2424 lines of C

The power of Lisp in 2424 readable lines of C code. I am jealous. +10KLoc of C++ now and an incredibility fat binary wasn't what I was aiming at.

Amazing!

Right. I've got rudimentary interactive mode support.

>> "hello world"
""hello world""

Amazing how many small bugs one can code.


>> if System.true then 1 else 2
if System.true then 1 else 2
Segmentation fault (core dumped)

Not to worry. The first is because I haven't completely implemented escape/unescape routines, the second probably because I massage terms in the wrong order in interactive mode.

Log 120916

I really hate it when stuff is conceptually foggy. And I don't see a direct manner of implementing the interactive mode into the interpreter. Something with that I need to redesign the module manager and implement a module which handles interactive mode. Or something...

Nah. That's a bad idea. The module manager handles the loading of modules. And interactive mode isn't a module. Not there.

Wednesday, December 7, 2016

Log 120716

Some more tinkering and the interpreter now can calculate "fib 5"; i.e., it has rudimentary operator support now hacked directly into the language. I need to factor out some of it later again.

My operator support scheme is ridiculously slow. I knew that when I designed it but can't say I am happy about it.

I wanted to benchmark the interpreter but that seems difficult since all trivial benchmarks seem to be gone from here. Moreover, the interpreter lacks basic IO routines. Ah well.

Monday, December 5, 2016

Log 120516

I am tinkering along to get operator support into the language. I once decided that the syntax to code generation part and the VM should be logically separate modules. That now leads to the slightly puzzling situation that I need to inform both parts of predefined operators separately, which isn't very satisfying.

Moreover, I today realized that a simple scheme I wanted to support, just toss in an async combinator through an external library, won't work. That doesn't imply that there isn't another trivial solution but it probably won't be as pretty as, for example, Javascript does it. Ah well.

Ah. I looked that up. Ascync/await will probably work. A bit error prone but, ah well, I am on the edge with an untyped language anyway.

Saturday, December 3, 2016

Log 120316

A lot of times writing code is just tinkering along until you find some solution which fits. Well, for me it is. With inheritance and a macro I got primitive combinator definitions down to some reasonable readable size.

Below, a definition of a dyadic combinator.

class Min: public Dyadic {
public:
    DYADIC_PREAMBLE(Min, "System", "min");

    VMObjectPtr apply(const VMObjectPtr& arg0, 
               const VMObjectPtr& arg1) const override {
        if ( (arg0->tag() == VM_OBJECT_INTEGER) &&
             (arg1->tag() == VM_OBJECT_INTEGER) ) {
            auto i0 = VM_OBJECT_INTEGER_VALUE(arg0);
            auto i1 = VM_OBJECT_INTEGER_VALUE(arg1);
            return VMObjectInteger(i0-i1).clone();
        } else if ( (arg0->tag() == VM_OBJECT_FLOAT) &&
             (arg1->tag() == VM_OBJECT_FLOAT) ) {
            auto f0 = VM_OBJECT_FLOAT_VALUE(arg0);
            auto f1 = VM_OBJECT_FLOAT_VALUE(arg1);
            return VMObjectFloat(f0-f1).clone();
        } else {
            return nullptr;
        }
    }
};

I can't say I am happy about the size of the binary it produces though. I guess it's necessary, but well, it won't run on a PIC.

Friday, December 2, 2016

Log 120216

The interpreter can do symbolic evaluation but it can't add two numbers yet. Reason, it has a pluggable architecture for combinators and I also need to implement the simple operator overloading scheme.

This is a big no-no for compiler writers, of course, but I can say I care much. But still, here's a definition for dyadic minus.

class IntegerMinus: public VMObjectCombinator {
public:
    IntegerMinus(VM* m): 
           VMObjectCombinator(VM_OBJECT_FLAG_INTERNAL, m, 
                                    "System", "intminus") {
    }

    IntegerMinus(const IntegerMinus& d)
        : IntegerMinus(d.machine()) {
    }

    VMObjectPtr clone() const {
        return VMObjectPtr(new IntegerMinus(*this));
    }

    VMObjectPtr reduce(const VMObjectPtr& thunk) const override {
        auto tt  = VM_OBJECT_ARRAY_VALUE(thunk);
        auto rt  = tt[0];
        auto rti = tt[1];
        auto k   = tt[2];

        VMObjectPtr r;
        if (tt.size() > 6) {
            auto arg0 = tt[5];
            auto arg1 = tt[6];

            auto i0 = VM_OBJECT_INTEGER_VALUE(arg0);
            auto i1 = VM_OBJECT_INTEGER_VALUE(arg1);
            auto ir = i0 + i1;
            r  = VMObjectInteger(ir).clone();
        } else {
            VMObjectPtrs rr;
            for (uint i = 4; i<tt.size(); i++) {
                rr.push_back(tt[i]);
            }
            r = VMObjectArray(rr).clone();
        }

        auto index = VM_OBJECT_INTEGER_VALUE(rti);
        auto rta   = VM_OBJECT_ARRAY_CAST(rt);
        rta->set(index, r);

        return k;
    }
};
At the moment I don't feel like copying this code for every operator, can't solve it reasonably with a macro or template (or I don't know how), and don't want to solve it with a general class and a function pointer. I'll wait a day, something will come up.

Much Weirdity

After more debugging it seems to evaluate fine. Weird stuff you can do too, shown below.
namespace Test (

    def f = [ X, Y -> X ]

    def g = [ f X -> f X X ]

)

using Test

def main = g (f 1)

Which gives


[marco@stallion src]$ ./egel ../tests/match.eg 
1


Okay, for those who assume this is a feature, this is fully intentional. Egel has eager combinator rewriting semantics so when evaluating g (f 1) it will try to reduce f 1, which fails since f is a dyadic function, therefor remains constant. After that g matches on f 1 and f 1 1 is reduced to 1. Completely according to semantics.

I hope to lift that behavior such that I can implement Mathematica style symbolic manipulation, though that has a far more complex operational semantics. But who knows?

Thursday, December 1, 2016

Hello 42!

Well. I passed a major milestone tonight. The egel interpreter evaluated a hello world application.

Given input

def main = 42

It produces bytecode

SYMBOLS: 
       0:main
DATA: 
       0:42
       1:main (4)
begin
000000 takex r1 r5 r0 i0
0x0009 fail 0x001f
0x000e data r6 0   ; 42
0x0015 set r1 r2 r6
0x001c return r3
0x001f array r7 r2 r1
0x0026 concatx r8 r7 r0 i4
0x002f set r1 r2 r8
0x0036 return r3
end

Which when run

[marco@stallion src]$ ./egel 42.eg 
42


It core dumps on more complex expressions, of course. I hand checked generated code, which looks okay-ish for small examples, so I'll be stuck debugging the bytecode interpreter for a while.



Guess I'll commit to github after that.

Wednesday, November 30, 2016

Log 301116

Well, after a long time of neglecting the sources on my computer I somehow got in the mood to code again. Somehow I needed to get rid of all the wild ideas I wanted to implement. So, I've been coding on-and-off for some time now.

Stuff I did.
  • Simplified the syntax even further.
  • Combinator lift routines.
  • Rewrote the module loader such that it can handle interactive mode.
  • Got rid of methods in favor of 'overloaded' operators.
  • Started on a bytecode interpreter.
  • Registered a github page.
It now generates very faulty bytecode. But I am happy that at least I got to the point that C++ accepts it.

I rushed the bytecode generator too much. This needs some re-thinking how to maintain several invariants when traversing the AST. And that for a trivial language which will never be used much. Ah well.

Sunday, April 10, 2016

Progress

So, I finally solved a major part of the puzzle. Now the rest.

Saturday, March 26, 2016

Wednesday, January 20, 2016

Apache Data Flow Proposal


Dataflow is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Dataflow also brings DSL in different languages, allowing users to easily implement their data integration processes.

Yeah, you can do it this way. Inevitably, people are moving away from map-reduce and one of the best manners of designing around more functionality is mapping stream processing functions onto cluster hardware.

That sounds simple enough for a language, but isn't. The two major hurdles are: For one, you're dealing with mobile code - and only languages like Java and Python seem to provide that (and, of course, Erlang.) Second, you're dealing with mobile data which means you're going to fall back to a form of explicit memory management since it's very costly to move around large objects.

Mobile code, mobile data. It's all solvable given current techniques but all-in-all it makes the most sense to target an impure imperative OO language with support for mobile code and combinator based programming. Preferable with primitives to turn an abstract data type into a service, or the reverse, when needed to prevent major refactoring and seamless scalability.

So, I am stuck designing a language for this since a Java library with a Scala interactive front-end will probably be so much better.

I should get back to my language, but it ain't easy. Mobile code isn't trivial, mobile data and seamless scalability is even further away.