Right, so I would really want to write a typed Egel. But I decided I cannot without a better performing back-end, I would just add types to a too slow language. Seems I cannot get rid of rewriting the Egel interpreter for a while yet.
I studied various solutions, hoped for a drop-in concurrent reference counting garbage collector but none exist. So now I am writing an back-end on basis of Daan Leijen's excellent mimalloc which I'll use as the concurrent slab allocator. The code for the moment does seem to write itself, which is excellent.
I can only hope it'll give me the one order increase in performance I need, otherwise, it'll be all for nought. But I'll do some extensive testing on this (since that needs to be done) and that'll include performance metrics.