Video Understanding with Interleaved Visual-Textual Tokens
Magnify subject details and enhance image quality