Tuesday, November 10, 2015

Google Dataflow A Unified Model for Batch and Streaming Data Processing

Not programming. Thinking. Google seems to be way ahead anyway, and as I concluded earlier, it simply seems to make more sense to base any new data center language on Java. Which they did. Above a really good presentation of Google Dataflow, which is probably the leading tech at the moment.

So, at Google, they already have a combinator/dataflow language which analyzes expressions in point-free style and pushes an 'optimized' computation to the back-end.

Haxl starts to look more and more like a one-off tool with out-dated bad technology which can't be lifted. I wouldn't invest in it.

Still. some stuff to be done here. Which is mostly in the back-end it seems. You don't want to bring up thousands of VMs and send jars around. You want them running an then have the ability to hot-swap in some code. A reported, if I got it right, two minutes wait-time for a batch where you're simply mostly waiting for VMs to come alive doesn't cut it in the long run.

Recommendation: Invest in Java VMs and hot-swapping of code.

No comments:

Post a Comment