poonyZ
's Collections
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts
as Your Personalized Assistant
Paper
•
2410.13360
•
Published
•
8
Note
值得关注
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper
•
2411.18203
•
Published
•
32
Towards Interpreting Visual Information Processing in Vision-Language
Models
Paper
•
2410.07149
•
Published
•
1
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Paper
•
2407.02477
•
Published
•
22
Enhancing Instruction-Following Capability of Visual-Language Models by
Reducing Image Redundancy
Paper
•
2411.15453
•
Published
Large Multi-modal Models Can Interpret Features in Large Multi-modal
Models
Paper
•
2411.14982
•
Published
•
16
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Paper
•
2412.06676
•
Published
•
9
Note
还行
From Uncertainty to Trust: Enhancing Reliability in Vision-Language
Models with Uncertainty-Guided Dropout Decoding
Paper
•
2412.06474
•
Published
Note
不好说
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary
Embedding Distillation
Paper
•
2412.09585
•
Published
•
10
Note
值得关注
SynerGen-VL: Towards Synergistic Image Understanding and Generation with
Vision Experts and Token Folding
Paper
•
2412.09604
•
Published
•
35
Note
还行
Analyzing The Language of Visual Tokens
Paper
•
2411.05001
•
Published
•
23
Note
值得关注
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via
Hierarchical Window Transformer
Paper
•
2412.13871
•
Published
•
18
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper
•
2412.13303
•
Published
•
13
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive
Survey
Paper
•
2412.18619
•
Published
•
53
Note
持续关注
Task Preference Optimization: Improving Multimodal Large Language Models
with Vision Task Alignment
Paper
•
2412.19326
•
Published
•
18
Explanatory Instructions: Towards Unified Vision Tasks Understanding and
Zero-shot Generalization
Paper
•
2412.18525
•
Published
•
68
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
•
2501.01904
•
Published
•
31