> Rayforce is a pure C17 zero-dependency embeddable engine where columnar analytics and graph traversals share a single operation DAG, pass through a multi-pass optimizer, and execute as fused morsel-driven bytecode. No malloc.
> Rayforce is a library you link, not a server you deploy. The C API is small enough to wrap from any language with an FFI.
I'm familiar with large-scale, commercial, client-server use cases for columnar analytics and graph traversal but what is the use case for an embedded server like this?
Perhaps the fact that 99+% of today's workloads could be running on the client if it were as easy as shipping Rayforce and the data directly to the client.
Besides that, pure C that you can embed into your app is much easier to deploy for some (and likely 100x more performant) than stuff that comes via Helm chart [cries in JVM 'big'-data solutions]
afaiu morsel-driven means the workload gets turned into 'smallish' chunks (morsels)
instead of having to pre-allocate upfront (e.g. 4 nodes get 1/4 each) it is more granular and dynamic
a worker that's "done" can request another morsel
pragmatic approach because nodes might not all be equally fast (cache, cpu frequency, throttling, …) and also some morsel workloads take longer than others depending on the values they contain and what kind of work needs to get done
so this approach tends to balance out nicely
I'm sure someone else can explain it better / correct me (please do!)
When I read up, it sounded like the same idea as work-stealing to me. Not surprising that different fields come up with the same idea under different terminology.
> Rayforce is a pure C17 zero-dependency embeddable engine where columnar analytics and graph traversals share a single operation DAG, pass through a multi-pass optimizer, and execute as fused morsel-driven bytecode. No malloc.
Sounds great. I hadn't seen an (explicitly) C17 project before. I wonder which features of it they use. I can only find very scant references in the depot (E.g.: https://github.com/RayforceDB/rayforce/blob/6c4b1eddad0ea728...).
Anyone know?
[flagged]
Any relation to the smash hit SHMUP of the same name?
No
Bummer
> Rayforce is a library you link, not a server you deploy. The C API is small enough to wrap from any language with an FFI.
I'm familiar with large-scale, commercial, client-server use cases for columnar analytics and graph traversal but what is the use case for an embedded server like this?
Perhaps the fact that 99+% of today's workloads could be running on the client if it were as easy as shipping Rayforce and the data directly to the client.
Besides that, pure C that you can embed into your app is much easier to deploy for some (and likely 100x more performant) than stuff that comes via Helm chart [cries in JVM 'big'-data solutions]
[dead]
I thought "morsel-driven" was AI slop, but it turns out to be in common usage in the HPC world. So I learned something from this post!
afaiu morsel-driven means the workload gets turned into 'smallish' chunks (morsels)
instead of having to pre-allocate upfront (e.g. 4 nodes get 1/4 each) it is more granular and dynamic
a worker that's "done" can request another morsel
pragmatic approach because nodes might not all be equally fast (cache, cpu frequency, throttling, …) and also some morsel workloads take longer than others depending on the values they contain and what kind of work needs to get done
so this approach tends to balance out nicely
I'm sure someone else can explain it better / correct me (please do!)
When I read up, it sounded like the same idea as work-stealing to me. Not surprising that different fields come up with the same idea under different terminology.
DuckDB and LadybugDB use the same terminology to describe internals.
Exactly!
[flagged]
Is this competing with e.g. an embedded duckdb?
[flagged]