ModernBERT-embed-large-unsupervised

modernbert-embed-unsupervised-large is the unsupervised checkpoint trained with the contrastors library for 1 epoch over the 235M weakly-supervised contrastive pairs curated in Nomic Embed.

We suggest using moderbert-embed-large for embedding tasks.

Performance

Model	Average (56)	Classification (12)	Clustering (11)	Pair Classification (3)	Reranking (4)	Retrieval (15)	STS (10)	Overall
nomic-embed-text-v1_unsup	59.9	71.2	42.5	83.7	55.0	48.0	80.8	30.7
modernbert-embed-base-unsupervised	60.03	72.11	44.34	82.78	55.0	47.05	80.33	31.2
modernbert-embed-large-unsupervised	60.71	72.90	44.96	83.44	55.54	47.90	80.95	29.86

Acknowledgment

We wanted to thank Zach Nussbaum from Nomic AI for building and sharing the Nomic Embed recipe and tools and its support during the training of this model!

The training has been run on Orange Business Cloud Avenue infrastructure.

Citation

If you find the model, dataset, or training code useful, please considering citing ModernBERT as well as Nomic Embed:

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

@misc{nussbaum2024nomic,
      title={Nomic Embed: Training a Reproducible Long Context Text Embedder}, 
      author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
      year={2024},
      eprint={2402.01613},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

And if you want to cite this fine-tuning in particular, please use:

@misc{ModernBERT-embed-large,
  title={ModernBERT-embed-large},
  author={Chaffin, Antoine},
  url={https://huggingface.co/lightonai/modernbert-embed-large},
  year={2025}
}

lightonai
/

modernbert-embed-large-unsupervised

ModernBERT-embed-large-unsupervised

Performance

Acknowledgment

Citation

Model tree for lightonai/modernbert-embed-large-unsupervised

Evaluation results