Performance Improvements in Libffi

(atgreen.github.io)

13 points | by atgreen a day ago

4 comments

rurban a day ago

Oh, I thought he does this already. Why was there a prepare, when it doesnt prepare the arg decoding.
[-]
- atgreen 19 hours ago
  
  TBH, the complexity of this step grew over time, and the overhead snuck up on us. The prep step does useful work (eg. determine stack space requirements). It's just that we don't have to do it again.
  Something I should have mentioned is that we could have avoided the new APIs if only there was space in the ffi_cif to stash a plan pointer. And I didn't want to break ABIs for this.
tadfisher 21 hours ago

Can we AOT-compile stubs instead of interpreting or JIT-compiling? I feel like most FFI users would call static, well-defined functions.
[-]
- atgreen 19 hours ago
  
  Yes, that's part of what was done here. So, create a plan, and then for some subset of plans, create AOT-compiled templates. The analogies are: a) original implementation is like interpreting via walking a syntax tree b) building/caching an execution plan is like interpreting by executing bytecode generated from the syntax tree c) using an AOT-compiled template is like execution from qemu's old TCG template system But we only do (c) for a popular subset of function signatures. The biggest win was (b), but (c) is still an improvement over (b).