ModernBERT-embed-large-unsupervised
modernbert-embed-unsupervised-large
is the unsupervised checkpoint trained with the contrastors library
for 1 epoch over the 235M weakly-supervised contrastive pairs curated in Nomic Embed.
We suggest using moderbert-embed-large for embedding tasks.
Performance
Model | Average (56) | Classification (12) | Clustering (11) | Pair Classification (3) | Reranking (4) | Retrieval (15) | STS (10) | Overall |
---|---|---|---|---|---|---|---|---|
nomic-embed-text-v1_unsup | 59.9 | 71.2 | 42.5 | 83.7 | 55.0 | 48.0 | 80.8 | 30.7 |
modernbert-embed-base-unsupervised | 60.03 | 72.11 | 44.34 | 82.78 | 55.0 | 47.05 | 80.33 | 31.2 |
modernbert-embed-large-unsupervised | 60.71 | 72.90 | 44.96 | 83.44 | 55.54 | 47.90 | 80.95 | 29.86 |
Acknowledgment
We wanted to thank Zach Nussbaum from Nomic AI for building and sharing the Nomic Embed recipe and tools and its support during the training of this model!
The training has been run on Orange Business Cloud Avenue infrastructure.
Citation
If you find the model, dataset, or training code useful, please considering citing ModernBERT as well as Nomic Embed:
@misc{modernbert,
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
year={2024},
eprint={2412.13663},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13663},
}
@misc{nussbaum2024nomic,
title={Nomic Embed: Training a Reproducible Long Context Text Embedder},
author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
year={2024},
eprint={2402.01613},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
And if you want to cite this fine-tuning in particular, please use:
@misc{ModernBERT-embed-large,
title={ModernBERT-embed-large},
author={Chaffin, Antoine},
url={https://huggingface.co/lightonai/modernbert-embed-large},
year={2025}
}
- Downloads last month
- 192
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for lightonai/modernbert-embed-large-unsupervised
Base model
answerdotai/ModernBERT-largeEvaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported76.642
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported39.438
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported70.473
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported91.830
- ap on MTEB AmazonPolarityClassificationtest set self-reported88.836
- f1 on MTEB AmazonPolarityClassificationtest set self-reported91.825
- accuracy on MTEB AmazonReviewsClassification (en)test set self-reported47.864
- f1 on MTEB AmazonReviewsClassification (en)test set self-reported47.281
- map_at_1 on MTEB ArguAnatest set self-reported26.885
- map_at_10 on MTEB ArguAnatest set self-reported41.525