TBH, the complexity of this step grew over time, and the overhead snuck up on us. The prep step does useful work (eg. determine stack space requirements). It's just that we don't have to do it again.
Something I should have mentioned is that we could have avoided the new APIs if only there was space in the ffi_cif to stash a plan pointer. And I didn't want to break ABIs for this.
Yes, that's part of what was done here. So, create a plan, and then for some subset of plans, create AOT-compiled templates. The analogies are:
a) original implementation is like interpreting via walking a syntax tree
b) building/caching an execution plan is like interpreting by executing bytecode generated from the syntax tree
c) using an AOT-compiled template is like execution from qemu's old TCG template system
But we only do (c) for a popular subset of function signatures. The biggest win was (b), but (c) is still an improvement over (b).
Oh, I thought he does this already. Why was there a prepare, when it doesnt prepare the arg decoding.
TBH, the complexity of this step grew over time, and the overhead snuck up on us. The prep step does useful work (eg. determine stack space requirements). It's just that we don't have to do it again.
Something I should have mentioned is that we could have avoided the new APIs if only there was space in the ffi_cif to stash a plan pointer. And I didn't want to break ABIs for this.
Can we AOT-compile stubs instead of interpreting or JIT-compiling? I feel like most FFI users would call static, well-defined functions.
Yes, that's part of what was done here. So, create a plan, and then for some subset of plans, create AOT-compiled templates. The analogies are: a) original implementation is like interpreting via walking a syntax tree b) building/caching an execution plan is like interpreting by executing bytecode generated from the syntax tree c) using an AOT-compiled template is like execution from qemu's old TCG template system But we only do (c) for a popular subset of function signatures. The biggest win was (b), but (c) is still an improvement over (b).