This looks cool, but I wonder how well their trained compiler generalizes to new task families. They trained on 29 specific types of tasks, with 800 sub tasks and many rephrasings of each one (the specs). They hold out some specs for validation, but don’t seem to have held out a full task family and maybe not even full sub tasks?
If the compiler can’t generalize well to unseen tasks then it’s effectively acting as a fancy router to one of 29/800 predefined LoRAs.
This looks cool, but I wonder how well their trained compiler generalizes to new task families. They trained on 29 specific types of tasks, with 800 sub tasks and many rephrasings of each one (the specs). They hold out some specs for validation, but don’t seem to have held out a full task family and maybe not even full sub tasks?
If the compiler can’t generalize well to unseen tasks then it’s effectively acting as a fancy router to one of 29/800 predefined LoRAs.