O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 42
Adaptive Decoding via Latent Preference Optimization Paper • 2411.09661 • Published Nov 14, 2024 • 10
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28, 2024 • 20