CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24, 2024 • 27
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20, 2024 • 47
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21, 2024 • 28
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published May 23, 2024 • 17
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published May 23, 2024 • 6
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 52
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Paper • 2405.17991 • Published May 28, 2024 • 12
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published May 30, 2024 • 35
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Paper • 2406.00392 • Published Jun 1, 2024 • 13
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4, 2024 • 38
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5, 2024 • 9
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Paper • 2406.02900 • Published Jun 5, 2024 • 12
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published Jun 6, 2024 • 21
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published Jun 7, 2024 • 28
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? Paper • 2406.04391 • Published Jun 6, 2024 • 8
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach Paper • 2406.04594 • Published Jun 7, 2024 • 6
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6, 2024 • 58
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12, 2024 • 23
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published Jun 13, 2024 • 18
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding Paper • 2406.09297 • Published Jun 13, 2024 • 4
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression Paper • 2406.11430 • Published Jun 17, 2024 • 23
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published Jun 20, 2024 • 33
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published Jun 20, 2024 • 30
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published Jun 18, 2024 • 37
Efficient World Models with Context-Aware Tokenization Paper • 2406.19320 • Published Jun 27, 2024 • 8
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1, 2024 • 75
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29, 2024 • 34
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Paper • 2407.01791 • Published Jul 1, 2024 • 5
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Paper • 2407.02687 • Published Jul 2, 2024 • 22
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5, 2024 • 28
HEMM: Holistic Evaluation of Multimodal Foundation Models Paper • 2407.03418 • Published Jul 3, 2024 • 9
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5, 2024 • 53
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published Jul 8, 2024 • 21
An accurate detection is not all you need to combat label noise in web-noisy datasets Paper • 2407.05528 • Published Jul 8, 2024 • 3
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging Paper • 2407.07315 • Published Jul 10, 2024 • 6
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning Paper • 2407.07523 • Published Jul 10, 2024 • 5
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16, 2024 • 44
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients Paper • 2407.11239 • Published Jul 15, 2024 • 8
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Paper • 2407.11062 • Published Jul 10, 2024 • 8
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 34
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22, 2024 • 20
PERSONA: A Reproducible Testbed for Pluralistic Alignment Paper • 2407.17387 • Published Jul 24, 2024 • 19
Longhorn: State Space Models are Amortized Online Learners Paper • 2407.14207 • Published Jul 19, 2024 • 18
VSSD: Vision Mamba with Non-Casual State Space Duality Paper • 2407.18559 • Published Jul 26, 2024 • 19
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Paper • 2408.01050 • Published Aug 2, 2024 • 8
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling Paper • 2408.04810 • Published Aug 9, 2024 • 23
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12, 2024 • 16
Heavy Labels Out! Dataset Distillation with Label Space Lightening Paper • 2408.08201 • Published Aug 15, 2024 • 19
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23, 2024 • 26
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published Aug 26, 2024 • 36
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28, 2024 • 43
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published Sep 10, 2024 • 15
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published Sep 16, 2024 • 41
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19, 2024 • 48
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19, 2024 • 25
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published Sep 20, 2024 • 40
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published Sep 24, 2024 • 13
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published Sep 25, 2024 • 28
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 145
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond Paper • 2410.02362 • Published Oct 3, 2024 • 18
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 5
MLP-KAN: Unifying Deep Representation and Function Learning Paper • 2410.03027 • Published Oct 3, 2024 • 29
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Paper • 2410.05363 • Published Oct 7, 2024 • 45
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning Paper • 2410.06373 • Published Oct 8, 2024 • 36
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Paper • 2410.07170 • Published Oct 9, 2024 • 15
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published Oct 3, 2024 • 47
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published Sep 28, 2024 • 19
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs Paper • 2410.05265 • Published Oct 7, 2024 • 30
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13, 2024 • 54
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 49
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 75
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 30
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning Paper • 2410.13618 • Published Oct 17, 2024 • 6
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs Paper • 2410.13276 • Published Oct 17, 2024 • 26
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published Oct 21, 2024 • 59
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published Oct 22, 2024 • 45
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 89
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark Paper • 2410.19168 • Published Oct 24, 2024 • 19
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25, 2024 • 19
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Paper • 2410.21264 • Published Oct 28, 2024 • 9
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction Paper • 2410.18481 • Published Oct 24, 2024 • 5
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA Paper • 2410.20672 • Published Oct 28, 2024 • 6
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published Oct 23, 2024 • 200
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Paper • 2410.21465 • Published Oct 28, 2024 • 11
Accelerating Direct Preference Optimization with Prefix Sharing Paper • 2410.20305 • Published Oct 27, 2024 • 6
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos Paper • 2410.23287 • Published Oct 30, 2024 • 19
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents Paper • 2410.22476 • Published Oct 29, 2024 • 25
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published Oct 28, 2024 • 16
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation Paper • 2410.21157 • Published Oct 28, 2024 • 6
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 47
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4, 2024 • 33
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published Nov 4, 2024 • 11
Controlling Language and Diffusion Models by Transporting Activations Paper • 2410.23054 • Published Oct 30, 2024 • 16
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7, 2024 • 37
Can sparse autoencoders be used to decompose and interpret steering vectors? Paper • 2411.08790 • Published Nov 13, 2024 • 8
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 20
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers Paper • 2411.10510 • Published Nov 15, 2024 • 8
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published Nov 17, 2024 • 52
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published Nov 20, 2024 • 15
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21, 2024 • 43
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 58
Cautious Optimizers: Improving Training with One Line of Code Paper • 2411.16085 • Published Nov 25, 2024 • 15
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 49
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Paper • 2411.18499 • Published Nov 27, 2024 • 18
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning Paper • 2412.00568 • Published Nov 30, 2024 • 14
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Paper • 2411.17459 • Published Nov 26, 2024 • 10
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 22
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer Paper • 2412.07720 • Published Dec 10, 2024 • 30
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models Paper • 2412.06071 • Published Dec 8, 2024 • 7
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Paper • 2412.10208 • Published Dec 13, 2024 • 19
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 88
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 30 days ago • 41
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 24 days ago • 34
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 23 days ago • 39