Collections
Discover the best community collections!
Collections including paper arxiv:2309.16588
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 123 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Paper • 2312.06134 • Published • 2 -
Efficient Monotonic Multihead Attention
Paper • 2312.04515 • Published • 6 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Exploring Format Consistency for Instruction Tuning
Paper • 2307.15504 • Published • 7
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 1 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 6
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 12 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 11 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 9
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Paper • 2211.00593 • Published • 2 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 23 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 10
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 7 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2