Sana Collection β‘οΈSana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer β’ 19 items β’ Updated 7 days ago β’ 79
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper β’ 2501.02976 β’ Published 9 days ago β’ 48
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 β’ 10 items β’ Updated Dec 13, 2024 β’ 50
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper β’ 2501.01427 β’ Published 13 days ago β’ 47
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper β’ 2501.00958 β’ Published 14 days ago β’ 93
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper β’ 2412.15213 β’ Published 27 days ago β’ 25
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models Paper β’ 2412.08629 β’ Published Dec 11, 2024 β’ 12
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis Paper β’ 2412.01819 β’ Published Dec 2, 2024 β’ 34
Llama 3.3 Collection This collection hosts the transformers and original repos of the Llama 3.3 β’ 1 item β’ Updated Dec 6, 2024 β’ 110
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper β’ 2411.15098 β’ Published Nov 22, 2024 β’ 54
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper β’ 2412.03555 β’ Published Dec 4, 2024 β’ 123
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge Paper β’ 2412.00176 β’ Published Nov 29, 2024 β’ 8
Training-free Regional Prompting for Diffusion Transformers Paper β’ 2411.02395 β’ Published Nov 4, 2024 β’ 25