-
Scaling Instruction-Finetuned Language Models
Paper β’ 2210.11416 β’ Published β’ 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper β’ 2312.00752 β’ Published β’ 139 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper β’ 2403.05530 β’ Published β’ 62 -
Yi: Open Foundation Models by 01.AI
Paper β’ 2403.04652 β’ Published β’ 62
Collections
Discover the best community collections!
Collections including paper arxiv:2402.14904
-
Watermarking Makes Language Models Radioactive
Paper β’ 2402.14904 β’ Published β’ 23 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper β’ 2402.15220 β’ Published β’ 19 -
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper β’ 2402.15319 β’ Published β’ 19 -
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Paper β’ 2402.11929 β’ Published β’ 10
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper β’ 2401.17464 β’ Published β’ 17 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper β’ 2402.00742 β’ Published β’ 11 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper β’ 2402.03300 β’ Published β’ 78 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper β’ 2402.01093 β’ Published β’ 46
-
Self-Rewarding Language Models
Paper β’ 2401.10020 β’ Published β’ 146 -
Orion-14B: Open-source Multilingual Large Language Models
Paper β’ 2401.12246 β’ Published β’ 12 -
MambaByte: Token-free Selective State Space Model
Paper β’ 2401.13660 β’ Published β’ 53 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper β’ 2401.13601 β’ Published β’ 45
-
bigscience/bloom
Text Generation β’ Updated β’ 1.39M β’ 4.81k -
4π§βπΎπ¨ββοΈπ§ββοΈπ§βππ·π©ββοΈπ§βππ§βπ»
StableDiffusionBiasExplorer
-
Chain-of-Thought Reasoning Without Prompting
Paper β’ 2402.10200 β’ Published β’ 105 -
Watermarking Makes Language Models Radioactive
Paper β’ 2402.14904 β’ Published β’ 23