Solving math word problems with process- and outcome-based feedback Paper • 2211.14275 • Published Nov 25, 2022 • 8
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 14 days ago • 93
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 52
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 64
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only Paper • 2306.01116 • Published Jun 1, 2023 • 33