Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15, 2024 • 12
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15, 2024 • 21
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published Apr 15, 2024 • 13
Taming Latent Diffusion Model for Neural Radiance Field Inpainting Paper • 2404.09995 • Published Apr 15, 2024 • 7
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published Apr 14, 2024 • 44
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15, 2024 • 30
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12, 2024 • 65
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting Paper • 2404.09458 • Published Apr 15, 2024 • 7
On Speculative Decoding for Multimodal Large Language Models Paper • 2404.08856 • Published Apr 13, 2024 • 14
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models Paper • 2404.09204 • Published Apr 14, 2024 • 11
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries Paper • 2406.12824 • Published Jun 18, 2024 • 21
Tokenization Falling Short: The Curse of Tokenization Paper • 2406.11687 • Published Jun 17, 2024 • 16
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published Jun 18, 2024 • 15
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors Paper • 2406.12459 • Published Jun 18, 2024 • 12
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Paper • 2406.11811 • Published Jun 17, 2024 • 17
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology Paper • 2406.11912 • Published Jun 16, 2024 • 27
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published Jun 18, 2024 • 32
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published Jun 18, 2024 • 30
TroL: Traversal of Layers for Large Language and Vision Models Paper • 2406.12246 • Published Jun 18, 2024 • 35