this is interesting. would be cool to explore something like integrating a vlm to add a "semantic" term to the loss function. looking through the comparisons, some of the baseline codecs create meaningfully different details (as could be described by text) in the images.
this is interesting. would be cool to explore something like integrating a vlm to add a "semantic" term to the loss function. looking through the comparisons, some of the baseline codecs create meaningfully different details (as could be described by text) in the images.