BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 2 days ago • 36
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 2 days ago • 62
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published 6 days ago • 12
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? Paper • 2501.05510 • Published 6 days ago • 31
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution Paper • 2501.05040 • Published 6 days ago • 11
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published 8 days ago • 22
On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis Paper • 2501.04377 • Published 7 days ago • 11
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 6 days ago • 74
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 8 days ago • 75
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 7 days ago • 218
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 8 days ago • 61
Scaling Laws for Floating Point Quantization Training Paper • 2501.02423 • Published 10 days ago • 24
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published 9 days ago • 6
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration Paper • 2412.13180 • Published 29 days ago • 13
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery Paper • 2501.01540 • Published 13 days ago • 6
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Paper • 2412.21059 • Published 16 days ago • 18
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 12 days ago • 38
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper • 2501.01895 • Published 12 days ago • 45
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published 16 days ago • 35