FiftyOne Computer Vision Datasets Come to the Hugging Face Hub
Use and Share Cutting Edge Computer Vision Datasets with Ease
Today’s cutting-edge ML models, like transformers and diffusion models, are primarily designed for unstructured data like text, audio, images, and videos. High-quality datasets built out of these unstructured components are essential for benchmarking and training state-of-the-art models.
Connecting this data to our models has always been a pain, with inhomogeneous data schemas putting the onus of data wrangling, filtering, and processing on the end user. Until now!
Introducing the integration between FiftyOne Computer Vision Datasets and the Hugging Face Hub. With this integration, you can
- Load visual datasets from the Hugging Face Hub directly into FiftyOne for streamlined data curation, visualization, and model inference/training.
- Share visual datasets to the Hugging Face Hub from FiftyOne for improved transparency and reproducibility.
In short:
- Hugging Face democratizes ML model distribution and application
- FiftyOne brings structure to unstructured visual data
- The FiftyOne 🤝 🤗 Hub integration bridges the gap between data and models
Before we dive into this integration, here’s some brief background:
What is FiftyOne?
FiftyOne is the leading open-source toolkit for curating, visualizing, and managing unstructured visual data. The library streamlines data-centric workflows, from finding low-confidence predictions to identifying poor-quality samples and uncovering hidden patterns in your data. The library supports all sorts of visual data, from images and videos to PDFs, point clouds, and meshes.
Whereas tabular data formats like a pandas DataFrame
or a Parquet file consist of rows and columns, FiftyOne datasets are considerably more flexible. The atomic element of a fiftyone.Dataset
is a sample, which contains all of the information related to a piece of visual data. These attributes are stored in fields, which can be elementary data types or fields with custom schemas (like object detections, keypoints, and polylines). FiftyOne datasets are efficient and flexible data structures for visual data for a few reasons:
- FiftyOne
Dataset
s are logical datasets pointing to media files on disk rather than storing the media file contents directly. - FiftyOne datasets are constructed from MongoDB documents, so they inherit the flexibility in MongoDB’s non-relational data model.
- FiftyOne natively integrates with vector databases for efficient retrieval and semantic search at scale.
When you put it all together, FiftyOne provides the most straightforward and intuitive API for filtering, indexing, evaluating, and aggregating over visual datasets.
🚀 This powerful data model allows you to apply Hugging Face transformer models directly to image or video datasets with a single line of code.
📢 The code blocks in this blog post require fiftyone>=0.24.0
and huggingface_hub>=0.24.0
(how convenient!).
Loading Visual Datasets from the 🤗 Hub
With FiftyOne’s Hugging Face Hub integration, you can load any FiftyOne dataset uploaded to the hub (see the section below), as well as most image-based datasets stored in Parquet files, which is the standard for datasets uploaded to the hub via the datasets
library. The load_from_hub()
function in FiftyOne’s Hugging Face utils handles both of these cases!
Loading FiftyOne Datasets from the 🤗 Hub
Any dataset pushed to the hub in one of FiftyOne’s supported common formats should have all of the necessary configuration info in its dataset repo on the hub, so you can load the dataset by specifying its repo_id
. As an example, to load the VisDrone detection dataset, all you need is:
import fiftyone as fo
from fiftyone.utils import load_from_hub
## load from the hub
dataset = load_from_hub("Voxel51/VisDrone2019-DET")
## visualize in app
session = fo.launch_app(dataset)
It’s as simple as that!
You can customize the download process, including the number of samples to download, the name of the created dataset object, whether or not it is persisted to disk, and more!
What Datasets Are in FiftyOne Format?
Every dataset uploaded via FiftyOne’s Hugging Face Hub integration will have a fiftyone
tag. You can see all datasets with this tag online at this URL. You can also retrieve this list programmatically using the Hugging Face Hub’s API:
from huggingface_hub import HfApi
api = HfApi()
api.list_datasets(tags="fiftyone")
In fact, this is how the list of loadable datasets is populated in FiftyOne’s Hugging Face Hub plugin!
Why load FiftyOne Datasets from the 🤗 Hub? Don’t reinvent the wheel. If the ML community has formatted and processed a popular dataset for you, spend your time on other parts of the model development pipeline.
Loading Parquet Datasets from the 🤗Hub with FiftyOne
You can also use the load_from_hub()
function to load datasets from Parquet files, giving you access to an even wider range of computer vision and multimodal datasets. This function makes it easy to specify which features you want to convert into FiftyOne labels, which features point to media files, and which splits/subsets to download. FiftyOne will handle type conversions for you and download images from URLs if necessary.
With this functionality, you can load:
- Image classification datasets like Food101 and ImageNet-Sketch
- Object detection datasets like CPPE-5 and WIDER FACE
- Segmentation datasets like SceneParse150 and Sidewalk Semantic
- Image captioning datasets like COYO-700M and New Yorker Caption Contest
- Visual question-answering datasets like TextVQA and ScienceQA And many more!
As a simple example, we can load the first 1,000 samples from the WikiArt dataset into FiftyOne with:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
dataset = load_from_hub(
"huggan/wikiart", ## repo_id
format="parquet", ## for Parquet format
classification_fields=["artist", "style", "genre"], ## columns to treat as classification labels
max_samples=1000, # number of samples to load
name="wikiart", # name of the dataset in FiftyOne
)
This one-line command gives us access to FiftyOne's data visualization, analysis, and understanding capabilities, which we can use to analyze artistic styles!
Why load Parquet Datasets from the 🤗 Hub in FiftyOne?
- Simplify your data processing pipelines.
- Bring structure to the data with clustering, semantic search, and dimensionality reduction techniques.
- Apply transformer models to your entire dataset with a single line of code.
đź“š Documentation on loading from the hub
Pushing FiftyOne Datasets to the 🤗 Hub
There’s never been an easier way to share your visual datasets with the world. Whether you are developing the benchmark for a visual understanding task or creating a collection of AI-generated artwork, push_to_hub()
from the FiftyOne Hugging Face utils allows you to upload image, video, or 3D datasets to the Hugging Face Hub in a single line of code.
Pushing a dataset to the hub is as simple as:
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone.utils.huggingface import push_to_hub
## load example dataset
dataset = foz.load_zoo_dataset("quickstart")
## push to hub
push_to_hub(dataset, "my-hf-dataset")
The upload process is highly customizable: you can specify a license, tags, a description, the number of files to upload at a time, the format of the exported dataset, and more!
When you call push_to_hub()
, the dataset will be uploaded to the repo with the specified repo name under your username, and the repo will be created if necessary. A Dataset Card will automatically be generated and populated with instructions for loading the dataset from the hub. You can even upload a thumbnail image/gif to appear on the Dataset Card with the preview_path
argument.
Here’s an example using many of these arguments, which would upload the dataset to the private repo https://huggingface.co/datasets/username/my-action-recognition-dataset with tags, an MIT license, a description, and a preview image:
dataset = foz.load_from_zoo("quickstart-video", max_samples=3)
push_to_hub(
dataset,
"my-video-dataset",
tags=["video", "tracking"],
license="mit",
description="A dataset of videos for action recognition tasks",
private=True,
preview_path="<path/to/preview.png>"
)
Why push FiftyOne Datasets to the 🤗 Hub?
- Share your data with your friends (gated access) or the broader ML community!
- Participate in our upcoming FiftyOne Dataset Curation Competition on Hugging Face!
đź“š Documentation on pushing to the hub
Conclusion
FiftyOne’s Hugging Face Hub integration makes sharing, using, and structuring visual datasets easier than ever. By combining the loading from and pushing to the hub, you can create your own dataset from existing datasets, as my teammate Harpreet Sahota did with MashupVQA. You can also connect your models and data by loading your dataset from the hub into FiftyOne and using FiftyOne’s Hugging Face Transformers integration. Generate embeddings, test out augmentation techniques, and fine-tune models without worrying about data formats.
Go forth: build better datasets and train better models!