LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published 28 days ago • 18
FasterViT: Fast Vision Transformers with Hierarchical Attention Paper • 2306.06189 • Published Jun 9, 2023 • 30
AM-RADIO: Agglomerative Model -- Reduce All Domains Into One Paper • 2312.06709 • Published Dec 10, 2023 • 1
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper • 2410.01680 • Published Oct 2, 2024 • 33
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published Oct 2, 2024 • 16
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published Sep 26, 2024 • 47
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published Jun 14, 2024 • 77
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models Paper • 2307.06925 • Published Jul 13, 2023 • 10
Point-Cloud Completion with Pretrained Text-to-image Diffusion Models Paper • 2306.10533 • Published Jun 18, 2023 • 8
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Paper • 2306.11698 • Published Jun 20, 2023 • 12
FasterViT: Fast Vision Transformers with Hierarchical Attention Paper • 2306.06189 • Published Jun 9, 2023 • 30
nvidia/stt_en_fastconformer_transducer_xlarge Automatic Speech Recognition • Updated Jun 12, 2023 • 1 • 21