ississssi
's Collections
Interestings
updated
Region-Aware Text-to-Image Generation via Hard Binding and Soft
Refinement
Paper
•
2411.06558
•
Published
•
34
SlimLM: An Efficient Small Language Model for On-Device Document
Assistance
Paper
•
2411.09944
•
Published
•
12
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form
Video Understanding with Multi-Axis Gradient Checkpointing
Paper
•
2411.19460
•
Published
•
10
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at
Scale
Paper
•
2412.05237
•
Published
•
47
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
•
2412.04814
•
Published
•
45
Moto: Latent Motion Token as the Bridging Language for Robot
Manipulation
Paper
•
2412.04445
•
Published
•
21
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for
Generalist Robotic Policies
Paper
•
2412.10345
•
Published
•
2
Learning Universal Policies via Text-Guided Video Generation
Paper
•
2302.00111
•
Published
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
88
DynamicScaler: Seamless and Scalable Video Generation for Panoramic
Scenes
Paper
•
2412.11100
•
Published
•
6
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
Paper
•
2412.09858
•
Published
•
1
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward
Modeling
Paper
•
2412.15084
•
Published
•
13
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive
Survey
Paper
•
2412.18619
•
Published
•
53
Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers
Paper
•
2501.02393
•
Published
•
6