matlok
's Collections
Papers - Image - Clip
updated
Paper
•
2309.16671
•
Published
•
20
Model Stock: All we need is just a few fine-tuned models
Paper
•
2403.19522
•
Published
•
10
Bigger is not Always Better: Scaling Properties of Latent Diffusion
Models
Paper
•
2404.01367
•
Published
•
21
On the Scalability of Diffusion-based Text-to-Image Generation
Paper
•
2404.02883
•
Published
•
18
Learning Transferable Visual Models From Natural Language Supervision
Paper
•
2103.00020
•
Published
•
11
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion
Paper
•
2310.03502
•
Published
•
78
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper
•
2404.07448
•
Published
•
11
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
•
2404.07973
•
Published
•
30
RegionGPT: Towards Region Understanding Vision Language Model
Paper
•
2403.02330
•
Published
•
2
On Speculative Decoding for Multimodal Large Language Models
Paper
•
2404.08856
•
Published
•
14
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper
•
2301.07093
•
Published
•
3
A Multimodal Automated Interpretability Agent
Paper
•
2404.14394
•
Published
•
21
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Paper
•
2404.14239
•
Published
•
9
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
Pre-training on Web-scale Image-Text Data
Paper
•
2404.15653
•
Published
•
27
MoDE: CLIP Data Experts via Clustering
Paper
•
2404.16030
•
Published
•
13
DOCCI: Descriptions of Connected and Contrasting Images
Paper
•
2404.19753
•
Published
•
12
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
•
2404.19427
•
Published
•
72
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Paper
•
2406.06911
•
Published
•
11
DataComp: In search of the next generation of multimodal datasets
Paper
•
2304.14108
•
Published
•
2
Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity
Paper
•
2406.17720
•
Published
•
8
SLIP: Self-supervision meets Language-Image Pre-training
Paper
•
2112.12750
•
Published
•
1
Generalized Out-of-Distribution Detection and Beyond in Vision Language
Model Era: A Survey
Paper
•
2407.21794
•
Published
•
5
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion
Paper
•
2403.06976
•
Published
•
2