Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper โข 2501.01423 โข Published 13 days ago โข 35
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper โข 2412.15322 โข Published 27 days ago โข 18
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper โข 2412.15322 โข Published 27 days ago โข 18
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper โข 2412.15322 โข Published 27 days ago โข 18 โข 2
Byte Latent Transformer: Patches Scale Better Than Tokens Paper โข 2412.09871 โข Published Dec 13, 2024 โข 88
Putting the Object Back into Video Object Segmentation Paper โข 2310.12982 โข Published Oct 19, 2023