ModernBERT-embed-large-unsupervised

modernbert-embed-unsupervised-large is the unsupervised checkpoint trained with the contrastors library for 1 epoch over the 235M weakly-supervised contrastive pairs curated in Nomic Embed.

We suggest using moderbert-embed-large for embedding tasks.

Performance

Model Average (56) Classification (12) Clustering (11) Pair Classification (3) Reranking (4) Retrieval (15) STS (10) Overall
nomic-embed-text-v1_unsup 59.9 71.2 42.5 83.7 55.0 48.0 80.8 30.7
modernbert-embed-base-unsupervised 60.03 72.11 44.34 82.78 55.0 47.05 80.33 31.2
modernbert-embed-large-unsupervised 60.71 72.90 44.96 83.44 55.54 47.90 80.95 29.86

Acknowledgment

We wanted to thank Zach Nussbaum from Nomic AI for building and sharing the Nomic Embed recipe and tools and its support during the training of this model!

The training has been run on Orange Business Cloud Avenue infrastructure.

Citation

If you find the model, dataset, or training code useful, please considering citing ModernBERT as well as Nomic Embed:

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}
@misc{nussbaum2024nomic,
      title={Nomic Embed: Training a Reproducible Long Context Text Embedder}, 
      author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
      year={2024},
      eprint={2402.01613},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

And if you want to cite this fine-tuning in particular, please use:

@misc{ModernBERT-embed-large,
  title={ModernBERT-embed-large},
  author={Chaffin, Antoine},
  url={https://huggingface.co/lightonai/modernbert-embed-large},
  year={2025}
}
Downloads last month
192
Safetensors
Model size
395M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for lightonai/modernbert-embed-large-unsupervised

Finetuned
(40)
this model

Evaluation results