Wednesday, November 25, 2015

That thing about Applicative

That thing about applicative is that I predicted the idea also years ago. Central in the (IO) monad is that you can inject a value, calculate along with it, and never observe what comes out (the latter makes the IO monad safe.)

And I predicted that any algebraic structure that has those qualities is good enough, or equivalent, to a monad. Moreover, that function composition seems to be a neater fit if you want to compose things outright.

So, now some people want to change from monad to applicative. Which is somewhat neater, but also falls into the category of mindless hogwash scientists amuse themselves with.

Ah well. Another prediction done right, I guess.

Friday, November 20, 2015

OS, VM, Language

So, I didn't do a lot because I don't know where to go from here. There's a somewhat clear division where you can place functionality: in OS, VM, or language. And I am not set on what to do whatever, wherever. But underlying that all, is the hardware of course.

Lets look at recent and coming hardware trends:

  1. Expect more cores since we're nearing the end of Moore's law.
  2. Better microprocessor design makes virtualization cheap.
  3. Switches and routers have (more or less under the radar) become orders faster.
  4. Memory is (slowly) catching up.

These trends and the drive towards services which can handle the traffic of enormous numbers of microdevices has brought forth the modern data center. A horizontally scaling piece of hardware created with redundancy over failing components, and with the advances in microprocessors, networking, and memory, that hardware is now more or less configured by software. I.e., the cloud, or infrastructure as a service.

Well here's catch one: I like to think of a data center as just another piece of hardware, so where's the bloody OS?

So I took OpenStack as an example 'OS,' though it's called a platform. And apparently the functionality of such a platform is scripted in the form of Python components which allow you to describe virtual networks connecting virtualized OSes running any number of applications.

Dammit. This is the modern world, of course. You glue vast amounts of functionality from the hardware to application stack together with a high-level scripting language to arrive at an 'OS' which can handle a data center. But I would prefer to bind to C. Because that's how it's done. Well, in the old days. Where's my Plan9? (I guess I should stick to clusters and Spark.)

Then there's the VM for my language. I set on reference counting things to keep the VM simple. And I would like to hot-swap code, though I am abandoning the thought a bit. The simplest manner is to simply garbage collect all definitions after they become garbage after you swapped some new code in. Which is prohibitively expensive under a reference counting scheme.

Moreover, mentally, I didn't find a nice trick yet how to deal with weak references in C++ which occur during a rewrite. Simply put, I don't want to reference count intermediaries, I only want to reference count thunks as they are rewritten. C++'s reference counting scheme has it reverse for some reason. Which is idiotic, since I assume more people have ran into the same problem as I did. It's starting to look like I'ld need to roll out my own reference counting.

Then there's the language. R moved towards OO, for good reason, and I probably, even if it's a mostly pure combinator language, should do the same. Tada, I would arrive at a bad, even more lousily slow, Python implementation.

And, lastly, there's the use of the language. (Future) Support for complex numbers and tensors I somewhat calculated in but will be a lot of work. But more troubling is the support for the ability to specify a DSL inside my language which I started wondering about after reading up on Haxl and Tensorflow.

Haxl is something like a bad DSL for the specification of out-of-order execution of queries, which is probably something I would need to think about. And if I would like to support something like machine learning then, TensorFlow, you'ld like a language which can support not only numerical and symbolic methods, but also things like automatic differentiation. And somehow the library approach as it is implemented in both cases doesn't quite cut it.

So, after some reading, my little language aimed at doing some trivial mathy stuff through sprinkling combinator expression over data centers now blew up to a project you need four PhDs for.

Great.

Tuesday, November 10, 2015

Google Dataflow A Unified Model for Batch and Streaming Data Processing



Not programming. Thinking. Google seems to be way ahead anyway, and as I concluded earlier, it simply seems to make more sense to base any new data center language on Java. Which they did. Above a really good presentation of Google Dataflow, which is probably the leading tech at the moment.

So, at Google, they already have a combinator/dataflow language which analyzes expressions in point-free style and pushes an 'optimized' computation to the back-end.

Haxl starts to look more and more like a one-off tool with out-dated bad technology which can't be lifted. I wouldn't invest in it.

Still. some stuff to be done here. Which is mostly in the back-end it seems. You don't want to bring up thousands of VMs and send jars around. You want them running an then have the ability to hot-swap in some code. A reported, if I got it right, two minutes wait-time for a batch where you're simply mostly waiting for VMs to come alive doesn't cut it in the long run.

Recommendation: Invest in Java VMs and hot-swapping of code.

Thursday, November 5, 2015

Log 110515

So. R, Erlang, Hadoop/Spark. A bit of Haxl. Java/Scala vs. Python. Code migration and hot-code swapping. Sockets. Evaluation trees. Combinator expressions. Module boundaries. Mobile code. Services. Monads and Applicative. Push and pull models. Reactive programming.

I would bet on Scala.