Friday, November 20, 2015

OS, VM, Language

So, I didn't do a lot because I don't know where to go from here. There's a somewhat clear division where you can place functionality: in OS, VM, or language. And I am not set on what to do whatever, wherever. But underlying that all, is the hardware of course.

Lets look at recent and coming hardware trends:

  1. Expect more cores since we're nearing the end of Moore's law.
  2. Better microprocessor design makes virtualization cheap.
  3. Switches and routers have (more or less under the radar) become orders faster.
  4. Memory is (slowly) catching up.

These trends and the drive towards services which can handle the traffic of enormous numbers of microdevices has brought forth the modern data center. A horizontally scaling piece of hardware created with redundancy over failing components, and with the advances in microprocessors, networking, and memory, that hardware is now more or less configured by software. I.e., the cloud, or infrastructure as a service.

Well here's catch one: I like to think of a data center as just another piece of hardware, so where's the bloody OS?

So I took OpenStack as an example 'OS,' though it's called a platform. And apparently the functionality of such a platform is scripted in the form of Python components which allow you to describe virtual networks connecting virtualized OSes running any number of applications.

Dammit. This is the modern world, of course. You glue vast amounts of functionality from the hardware to application stack together with a high-level scripting language to arrive at an 'OS' which can handle a data center. But I would prefer to bind to C. Because that's how it's done. Well, in the old days. Where's my Plan9? (I guess I should stick to clusters and Spark.)

Then there's the VM for my language. I set on reference counting things to keep the VM simple. And I would like to hot-swap code, though I am abandoning the thought a bit. The simplest manner is to simply garbage collect all definitions after they become garbage after you swapped some new code in. Which is prohibitively expensive under a reference counting scheme.

Moreover, mentally, I didn't find a nice trick yet how to deal with weak references in C++ which occur during a rewrite. Simply put, I don't want to reference count intermediaries, I only want to reference count thunks as they are rewritten. C++'s reference counting scheme has it reverse for some reason. Which is idiotic, since I assume more people have ran into the same problem as I did. It's starting to look like I'ld need to roll out my own reference counting.

Then there's the language. R moved towards OO, for good reason, and I probably, even if it's a mostly pure combinator language, should do the same. Tada, I would arrive at a bad, even more lousily slow, Python implementation.

And, lastly, there's the use of the language. (Future) Support for complex numbers and tensors I somewhat calculated in but will be a lot of work. But more troubling is the support for the ability to specify a DSL inside my language which I started wondering about after reading up on Haxl and Tensorflow.

Haxl is something like a bad DSL for the specification of out-of-order execution of queries, which is probably something I would need to think about. And if I would like to support something like machine learning then, TensorFlow, you'ld like a language which can support not only numerical and symbolic methods, but also things like automatic differentiation. And somehow the library approach as it is implemented in both cases doesn't quite cut it.

So, after some reading, my little language aimed at doing some trivial mathy stuff through sprinkling combinator expression over data centers now blew up to a project you need four PhDs for.

Great.

No comments:

Post a Comment