Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.20204

about 14 hours ago

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 27
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 13
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 47
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 29

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 41
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

Awesome Visual Embedding

RhapsodyAI/MiniCPM-V-Embedding-preview

Feature Extraction • Updated Aug 20, 2024 • 26 • 45
vidore/colidefics

Updated Jul 11, 2024 • 3
vidore/colpali

Updated Sep 27, 2024 • 27.1k • 406
Unifying Multimodal Retrieval via Document Screenshot Embedding

Paper • 2406.11251 • Published Jun 17, 2024 • 10

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 67
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 25
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 29
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 38

Multimodal text-image embeddings

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30, 2024 • 35
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images

Paper • 2412.08802 • Published Dec 11, 2024 • 4
jinaai/jina-clip-v2

Zero-Shot Image Classification • Updated 2 days ago • 27.6k • 160
jinaai/jina-clip-v1

Feature Extraction • Updated 9 days ago • 50.4k • 235

Text Embeddings

This collection will consist of articles on text embeddings.

Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 79
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published May 11, 2024 • 17
Gecko: Versatile Text Embeddings Distilled from Large Language Models

Paper • 2403.20327 • Published Mar 29, 2024 • 48
Multilingual E5 Text Embeddings: A Technical Report

Paper • 2402.05672 • Published Feb 8, 2024 • 21

Applied Machine Learning Papers

Reading List (Mainly Focused of VLM's and Diffusion Models)

Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 17
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 12
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 11
U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 9

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

about 10 hours ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 125
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 51
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 13
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 65

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

Paper • 2401.13313 • Published Jan 24, 2024 • 5
BAAI/Bunny-v1_0-4B

Text Generation • Updated Jun 24, 2024 • 267 • 9
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 102
Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30, 2024 • 35

YOLO-World: Real-Time Open-Vocabulary Object Detection

Paper • 2401.17270 • Published Jan 30, 2024 • 35
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 12
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 16
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 27

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs