Fine-tune Flair Models on NER Dataset with ๐ค AutoTrain SpaceRunner
TLDR: In this blog post, we demonstrate how to fine-tune Flair models on the German MobIE NER dataset with the powerful ๐ค AutoTrain library and SpaceRunner. Additionally, we create visually appealing Model Cards using the ๐ค Hub client library.
Introduction
The Flair library is a straightforward framework for state-of-the-art NLP, developed by Humboldt University of Berlin and friends, and fully integrated into the Model Hub.
We utilize Flair in this blog post to fine-tune a model for named entity recognition (NER) on a German dataset in the Mobility Domain (MobIE dataset).
This blog post also outlines how to use this dataset in Flair and how to conduct a basic hyper-parameter search.
By leveraging the ๐ค AutoTrain library with the SpaceRunner feature, the entire fine-tuning process can be accomplished cost-effective and efficiently within the Hugging Face ecosystem.
Furthermore, this blog post demonstrates how to automatically upload finely-tuned models and create informative, aesthetically pleasing model cards.
German MobIE Dataset
The German MobIE Dataset was introduced in the MobIE paper by Hennig, Truong and Gabryszak (2021).
This is a German-language dataset that has been human-annotated with 20 coarse- and fine-grained entity types, and it includes entity linking information for geographically linkable entities. The dataset comprises 3,232 social media texts and traffic reports, totaling 91K tokens, with 20.5K annotated entities, of which 13.1K are linked to a knowledge base. In total, 20 different named entities are annotated.
To use this dataset in Flair, we must create our own dataset loader because the dataset has not yet been integrated into the Flair library:
class NER_GERMAN_MOBIE(ColumnCorpus):
def __init__(
self,
base_path: Optional[Union[str, Path]] = None,
in_memory: bool = True,
**corpusargs,
) -> None:
base_path = flair.cache_root / "datasets" if not base_path else Path(base_path)
dataset_name = self.__class__.__name__.lower()
data_folder = base_path / dataset_name
data_path = flair.cache_root / "datasets" / dataset_name
columns = {0: "text", 3: "ner"}
train_data_file = data_path / "train.conll2003"
if not train_data_file.is_file():
temp_file = cached_path(
"https://github.com/DFKI-NLP/MobIE/raw/master/v1_20210811/ner_conll03_formatted.zip",
Path("datasets") / dataset_name,
)
from zipfile import ZipFile
with ZipFile(temp_file, "r") as zip_file:
zip_file.extractall(path=data_path)
super().__init__(
data_folder,
columns,
in_memory=in_memory,
comment_symbol=None,
document_separator_token="-DOCSTART-",
**corpusargs,
)
The following figure shows an annotated sentence (taken from the MobIE paper):
Fine-Tuning with Flair
We use the latest Flair version for fine-tuning. Additionally, the model is trained with the FLERT (Schweter and Akbik (2020) approach, because the MobIE dataset thankfully comes with document boundary information marker. The GBERT Base model from Chan et al. (2020) is used as backbone LM.
We define a very basic hyper-parameter search over the following parameters:
- Batch Sizes =
[16]
- Learning Rates =
[3e-05, 5e-05]
- Seeds =
[1, 2, 3, 4, 5]
This means that 10 models are trained in total. The hyper-parameter search could be implemented like this:
# Hyper-Parameter search definitions
batch_sizes = [16]
learning_rates = [3e-05, 5e-05]
seeds = [1, 2, 3, 4, 5]
epochs = [10]
context_sizes = [64]
# Backbone LM definitions
base_model = "deepset/gbert-base"
base_model_short = "gbert_base"
# Hugging Face Model Hub configuration
hf_token = os.environ.get("HF_TOKEN")
hf_hub_org_name = os.environ.get("HUB_ORG_NAME")
for seed in seeds:
for batch_size in batch_sizes:
for epoch in epochs:
for learning_rate in learning_rates:
for context_size in context_sizes:
experiment_configuration = ExperimentConfiguration(
batch_size=batch_size,
learning_rate=learning_rate,
epoch=epoch,
context_size=context_size,
seed=seed,
base_model=base_model,
base_model_short=base_model_short,
)
output_path = run_experiment(experiment_configuration=experiment_configuration)
The implementation of the run_experiment()
method (that holds the complete fine-tuning logic) can be found here.
Start Fine-Tuning with ๐ค AutoTrain SpaceRunner
The fine-tuning process begins by using the remarkable ๐ค AutoTrain library with SpaceRunner feature.
This initiates a Docker-based SpaceRunner, and the entire model fine-tuning process is carried out on hardware provided by Hugging Face.
We utilize a T4 Small instance for our experiments. To set up SpaceRunner, two essential files are required:
script.py
: This file manages the entire fine-tuning process, including implementing the hyper-parameter search and model uploading. You can find an example of this here.requirements.txt
: This file defines all the necessary dependencies that are installed in the AutoTrain Space.
Before commencing AutoTrain fine-tuning, the following environment variables need to be created:
HF_TOKEN
: This is the User Access Token, which can be obtained hereHUB_ORG_NAME
: This is the username or organization where the AutoTrain space is created.
To start the fine-tuning process via the command line, use the following command:
$ autotrain spacerunner --project-name "flair-mobie" \
--script-path $(pwd) \
--username stefan-it \
--token $HF_TOKEN \
--backend spaces-t4s\
--env "HF_TOKEN=$HF_TOKEN;HUB_ORG_NAME=stefan-it"
This command creates a Docker space where the entire fine-tuning process can be monitored. Additionally, it establishes a new dataset repository where all source files are stored.
The fine-tuning of all ten models in this tutorial required 4 hours and 34 minutes on the T4 small instance and had a total cost of $2.74.
Model Upload
After each model is fine-tuned, the following files/folder are uploaded to the Model Hub (one repository for every model):
pytorch-model.bin
: Flair internally tracks the best model asbest-model.pt
over all epochs. To be compatible with the Model Hub thebest-model.pt
, is renamed automatically topytorch_model.bin
training.log
: Flair stores the training log intraining.log
. This file is later needed to parse the best F1-score on development set./runs
: In this folder the TensorBoard logs are stored. This enables a nice display of metrics on the Model Hub
The repository creation and uploading of files/folders are done via the awesome ๐ค Hub client library:
# Creates repository
repo_url = api.create_repo(
repo_id=f"{hf_hub_org_name}/{output_path}",
token=hf_token,
private=True,
exist_ok=True,
)
# Upload TensorBoard logs
api.upload_folder(
folder_path=f"{output_path}/runs",
path_in_repo="./runs",
repo_id=f"{hf_hub_org_name}/{output_path}",
repo_type="model"
)
# Upload Flair's training log
api.upload_file(
path_or_fileobj=f"{output_path}/training.log",
path_in_repo="./training.log",
repo_id=f"{hf_hub_org_name}/{output_path}",
repo_type="model"
)
# Upload best model
api.upload_file(
path_or_fileobj=f"{output_path}/best-model.pt",
path_in_repo="./pytorch_model.bin",
repo_id=f"{hf_hub_org_name}/{output_path}",
repo_type="model"
)
Model Card Creation
After the automatic upload of all models to the Model Hub, we now want to create model cards for each model with the following features:
- Nice looking model card with metadata (important!);
- A working inference widget to try out other NER examples;
- A results overview.
In order to create model cards automatically, we extensively use the ๐ค Hub client library.
Metadata Section
We use the following template to define the metadata section of each model:
---
language: de
license: mit
tags:
- flair
- token-classification
- sequence-tagger-model
base_model: {{ base_model }}
widget:
- text: {{ widget_text }}
---
Later, we pass base_model
and widget_text
to this template.
Results Overview
The results overview section is very important and includes the following steps:
- Iterating over all the fine-tuned models and parsing the
training.log
file to get the best F1-score on the development set - Constructing a results table for all hyper-parameter configurations (in our example, batch size, number of epochs and learning rate) and their different F1-scores for every seed, including averaged F1-Score and standard deviation
All these steps are shown in our example notebook.
After retrieving all results from the training.log
files, a Pandas DataFrame shows the results averaged over all seeds and grouped by the hyper-parameter configuration:
Configuration | Seed 1 | Seed 2 | Seed 3 | Seed 4 | Seed 5 | Average | Std. |
---|---|---|---|---|---|---|---|
bs16-e10-lr5e-05 | 0.8446 | 0.8495 | 0.8455 | 0.8419 | 0.8476 | 0.8458 | 0.0029 |
bs16-e10-lr3e-05 | 0.8392 | 0.8445 | 0.8495 | 0.8381 | 0.8449 | 0.8432 | 0.0046 |
However, this is not enough! We want to link the corresponding models and also highlight the result of the current viewed model on the Model Hub. A final results table will then look like:
Configuration | Seed 1 | Seed 2 | Seed 3 | Seed 4 | Seed 5 | Average |
---|---|---|---|---|---|---|
bs16-e10-lr5e-05 |
0.8446 | 0.8495 | 0.8455 | 0.8419 | 0.8476 | 0.8458 ยฑ 0.0029 |
bs16-e10-lr3e-05 |
0.8392 | 0.8445 | 0.8495 | 0.8381 | 0.8449 | 0.8432 ยฑ 0.0046 |
PR Creation
After we locally constructed nice-looking model cards, we now want to push them for all of our fine-tuned models. Here's the final code snippet - that also allows you to define a good commit message and description:
commit_message = "readme: add initial version of model card"
commit_description = "Hey,\n\nthis PR adds the initial version of model card."
create_pr = True
for model in model_infos:
current_results_table = get_results_table(final_df, model_infos, model)
card_data = ModelCardData()
card = ModelCard.from_template(card_data, template_path="model_card_template.md",
base_model=base_model,
base_model_short=base_model_short,
batch_sizes=f'[{", ".join([f"`{bs}`" for bs in batch_sizes ])}]',
learning_rates=f'[{", ".join([f"`{lr}`" for lr in learning_rates ])}]',
results=current_results_table,
widget_text=widget_text.strip()
)
commit_url = card.push_to_hub(repo_id=model.model_id,
create_pr=create_pr,
commit_message=commit_message,
commit_description=commit_description)
print(commit_url + "\n")
An example PR can be seen here.
It is also possible to set the `create_pr` parameter to `False`. This means that the PR is automatically merged without review!
Wait, where are my models?
Initially, the model repositories were created with the private=True
option. This means all models are not yet publicly visible - but they can easily be set to public with:
# Now make repositories publicly visible
for model in model_infos:
print(f"Update visibility to True for repo https://hf.co/{model.model_id}")
update_repo_visibility(repo_id=model.model_id, private=False)
Model Card Showcase
Now it is time to showcase the uploaded model card!
Model Card Header
TensorBoard Metrics
Inference Widget
Results Section
Summary
In this blog post we show how to use Flair in combination with the ๐ค AutoTrain library with SpaceRunner to fine-tune models on the German MobIE NER dataset with a basic hyper-parameter search.
Additionally, we used the ๐ค Hub client library to automatically upload nice looking model cards with useful information.
Additional Resources
- The
autotrain-flair-mobie
repository with all code - Collection of fine-tuned models
- ๐ค AutoTrain library
- SpaceRunner feature by @abhishek
- ๐ค Hub client library documentation