matlok
's Collections
Papers - Image - Encoders
updated
CSWin Transformer: A General Vision Transformer Backbone with
Cross-Shaped Windows
Paper
•
2107.00652
•
Published
•
2
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper
•
2403.09622
•
Published
•
16
Veagle: Advancements in Multimodal Representation Learning
Paper
•
2403.08773
•
Published
•
7
mPLUG-Owl: Modularization Empowers Large Language Models with
Multimodality
Paper
•
2304.14178
•
Published
•
3
ViTAR: Vision Transformer with Any Resolution
Paper
•
2403.18361
•
Published
•
53
TextCraftor: Your Text Encoder Can be Image Quality Controller
Paper
•
2403.18978
•
Published
•
14
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact
Language Model
Paper
•
2404.01331
•
Published
•
25
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper
•
2404.03566
•
Published
•
14
TrOCR: Transformer-based Optical Character Recognition with Pre-trained
Models
Paper
•
2109.10282
•
Published
•
6
Text Role Classification in Scientific Charts Using Multimodal
Transformers
Paper
•
2402.14579
•
Published
•
1