Papers
arxiv:2311.06242

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Published on Nov 10, 2023
ยท Submitted by akhaliq on Nov 13, 2023
#1 Paper of the day
Authors:
,
,
,
,

Abstract

We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models excel in transfer learning, they struggle to perform a diversity of tasks with simple instructions, a capability that implies handling the complexity of various spatial hierarchy and semantic granularity. Florence-2 was designed to take text-prompt as task instructions and generate desirable results in text forms, whether it be captioning, object detection, grounding or segmentation. This multi-task learning setup demands large-scale, high-quality annotated data. To this end, we co-developed FLD-5B that consists of 5.4 billion comprehensive visual annotations on 126 million images, using an iterative strategy of automated image annotation and model refinement. We adopted a sequence-to-sequence structure to train Florence-2 to perform versatile and comprehensive vision tasks. Extensive evaluations on numerous tasks demonstrated Florence-2 to be a strong vision foundation model contender with unprecedented zero-shot and fine-tuning capabilities.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Great works. Will the data be open-released?

Great work. Is there any model release open for external evaluation?

Appreciate the effort and great work, will FLD-5B be available for public.

Thanks in advance.

Florence-2: The Future of Unified Vision Tasks!

Links ๐Ÿ”—:

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

For those coming in late! The MSFT team released the following checkpoints:

https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de

Sign up or log in to comment

Models citing this paper 35

Browse 35 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 211

Collections including this paper 45