mT5-small-sum-de-mit-v1

This is a German summarization model. It is based on the multilingual T5 model google/mt5-small. The special characteristic of this model is that, unlike many other models, it is licensed under a permissive open source license (MIT). Among other things, this license allows commercial use.

One Conversation This model is provided by the One Conversation team of Deutsche Telekom AG.

Training

The training was conducted with the following hyperparameters:

  • base model: google/mt5-small
  • source_prefix: "summarize: "
  • batch size: 3 (6)
  • max_source_length: 800
  • max_target_length: 96
  • warmup_ratio: 0.3
  • number of train epochs: 10
  • gradient accumulation steps: 2
  • learning rate: 5e-5

Datasets and Preprocessing

The datasets were preprocessed as follows:

The summary was tokenized with the google/mt5-small tokenizer. Then only the records with no more than 94 summary tokens were selected.

This model is trained on the following dataset:

Name Language Size License
SwissText 2019 - Train de 84,564 Concrete license is unclear. The data was published in the German Text Summarization Challenge.

We have permission to use the Swisstext dataset and release the resulting summarization model under MIT license (see permission-declaration-swisstext.pdf).

Evaluation on MLSUM German Test Set (no beams)

Model rouge1 rouge2 rougeL rougeLsum
deutsche-telekom/mt5-small-sum-de-mit-v1 (this) 16.8023 3.5531 12.6884 14.7624
ml6team/mt5-small-german-finetune-mlsum 18.3607 5.3604 14.5456 16.1946
deutsche-telekom/mt5-small-sum-de-en-01 21.7336 7.2614 17.1323 19.3977

License

Copyright (c) 2021 Philip May, Deutsche Telekom AG

Licensed under the MIT License (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.

Downloads last month
66
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.