Right, so due to side-effecting code a range of optimizations is out of reach. For example, even beta reduction is unsafe under most circumstances. Like below.
(\f -> f . f) (print 'hello'; (\x -> x + 1)) 0
That gives one side effect strictly right-to-left reduced, but two when 'optimized' by first beta-reducing the term.
HOWEVER,
It should be safe to just observe what bytecode instructions were emitted during the normal evaluation and glue those together.
At some point, I am going to optimize with some kind of bytecode player/recorder.