SentenceTransformer based on microsoft/mpnet-base

This is a sentence-transformers model finetuned from microsoft/mpnet-base on the all-nli-triplets-turkish dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/mpnet-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Languages: en, tr

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mertcobanov/mpnet-base-all-nli-triplet-turkish-v4-dgx")
# Run inference
sentences = [
    'Böyle şeyler görmek ve eğer yapabileceğiniz en küçük bir şey varsa, bu yardımcı olur.',
    'Böyle bir şeyi gözlemlemek ve yapıp yapamayacağınızı bilmek için.',
    'Böyle bir şeyi görmek kötü, eğer yapabiliyorsanız buna hiç katkıda bulunmayın.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

  • Datasets: all-nli-dev-turkish and all-nli-test-turkish
  • Evaluated with TripletEvaluator
Metric all-nli-dev-turkish all-nli-test-turkish
cosine_accuracy 0.7764 0.7741

Training Details

Training Dataset

all-nli-triplets-turkish

  • Dataset: all-nli-triplets-turkish at 13554fd
  • Size: 120,781 training samples
  • Columns: anchor_translated, positive_translated, and negative_translated
  • Approximate statistics based on the first 1000 samples:
    anchor_translated positive_translated negative_translated
    type string string string
    details
    • min: 3 tokens
    • mean: 11.77 tokens
    • max: 40 tokens
    • min: 3 tokens
    • mean: 11.1 tokens
    • max: 46 tokens
    • min: 2 tokens
    • mean: 12.41 tokens
    • max: 44 tokens
  • Samples:
    anchor_translated positive_translated negative_translated
    Bir kişi, bir atın üzerinde, bozulmuş bir uçağın üzerinden atlıyor. Bir kişi dışarıda, bir atın üzerinde. Bir kişi bir lokantada omlet siparişi veriyor.
    Bir Küçük Lig takımı, bir oyuncunun bir üsse kayarak girmeye çalıştığı sırada onu yakalamaya çalışıyor. Bir takım bir koşucuyu dışarı atmaya çalışıyor. Bir takım Satürn'de beyzbol oynuyor.
    Kadın beyaz giyiyor. Beyaz bir ceket giymiş bir kadın bir tekerlekli sandalyeyi itiyor. Siyah giyinmiş bir adam, siyah giyinmiş bir kadını kucaklıyor.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

all-nli-triplets-turkish

  • Dataset: all-nli-triplets-turkish at 13554fd
  • Size: 6,584 evaluation samples
  • Columns: anchor_translated, positive_translated, and negative_translated
  • Approximate statistics based on the first 1000 samples:
    anchor_translated positive_translated negative_translated
    type string string string
    details
    • min: 2 tokens
    • mean: 22.3 tokens
    • max: 135 tokens
    • min: 1 tokens
    • mean: 10.92 tokens
    • max: 41 tokens
    • min: 3 tokens
    • mean: 10.81 tokens
    • max: 34 tokens
  • Samples:
    anchor_translated positive_translated negative_translated
    Ayrıca, bu özel tüketim vergileri, diğer vergiler gibi, hükümetin ödeme zorunluluğunu sağlama yetkisini kullanarak belirlenir. Hükümetin ödeme zorlaması, özel tüketim vergilerinin nasıl hesaplandığını belirler. Özel tüketim vergileri genel kuralın bir istisnasıdır ve aslında GSYİH payına dayalı olarak belirlenir.
    Gri bir sweatshirt giymiş bir sanatçı, canlı renklerde bir kasaba tablosu üzerinde çalışıyor. Bir ressam gri giysiler içinde bir kasabanın resmini yapıyor. Bir kişi bir beyzbol sopası tutuyor ve gelen bir atış için planda bekliyor.
    İmkansız. Yapılamaz. Tamamen mümkün.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss all-nli-dev-turkish_cosine_accuracy all-nli-test-turkish_cosine_accuracy
0 0 - - 0.5729 -
0.2119 100 6.6103 4.5154 0.6970 -
0.4237 200 5.1602 3.7328 0.7195 -
0.6356 300 4.4533 3.3389 0.7372 -
0.8475 400 3.4465 3.6044 0.7187 -
1.0572 500 2.6977 3.3043 0.7418 -
1.2691 600 3.8142 3.2066 0.7512 -
1.4809 700 3.4333 3.0716 0.7508 -
1.6928 800 3.1488 2.9590 0.7553 -
1.9047 900 1.8677 3.2416 0.7442 -
2.1144 1000 2.2034 2.9323 0.7634 -
2.3263 1100 2.9834 2.9406 0.7669 -
2.5381 1200 2.6785 2.8607 0.7672 -
2.75 1300 2.5096 2.8939 0.7684 -
2.9619 1400 0.876 3.2539 0.7416 -
3.1716 1500 2.3355 2.7503 0.7758 -
3.3835 1600 2.4666 2.7920 0.7707 -
3.5953 1700 2.2691 2.7860 0.7729 -
3.8072 1800 1.8024 2.9899 0.7571 -
4.0169 1900 0.6443 3.0993 0.7456 -
4.2288 2000 2.3976 2.7792 0.7811 -
4.4407 2100 2.1145 2.7968 0.7728 -
4.6525 2200 1.9788 2.7243 0.7751 -
4.8644 2300 1.1676 2.9885 0.7567 -
5.0742 2400 1.0009 2.7374 0.7767 -
5.2860 2500 2.1276 2.7822 0.7767 -
5.4979 2600 1.8459 2.7822 0.7760 -
5.7097 2700 1.7659 2.7322 0.7766 -
5.9216 2800 0.5916 3.0191 0.7596 -
6.1314 2900 1.3908 2.6973 0.7772 -
6.3432 3000 1.9257 2.7585 0.7763 -
6.5551 3100 1.6558 2.7350 0.7760 -
6.7669 3200 1.5368 2.7903 0.7722 -
6.9788 3300 0.1968 3.0849 0.7479 -
7.1886 3400 1.8044 2.6626 0.7825 -
7.4004 3500 1.7048 2.7380 0.7790 -
7.6123 3600 1.5666 2.7250 0.7796 -
7.8242 3700 1.0954 2.9620 0.7629 -
8.0339 3800 0.487 2.8900 0.7641 -
8.2458 3900 1.8398 2.7186 0.7796 -
8.4576 4000 1.5659 2.7259 0.7778 -
8.6695 4100 1.4825 2.7007 0.7760 -
8.8814 4200 0.7019 2.9050 0.7675 -
9.0911 4300 0.9278 2.7606 0.7731 -
9.3030 4400 1.766 2.6978 0.7787 -
9.5148 4500 1.4699 2.7114 0.7801 -
9.7267 4600 1.4647 2.7096 0.7799 -
9.9386 4700 0.3321 2.7418 0.7764 -
9.9809 4720 - - - 0.7741

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.3
  • PyTorch: 2.4.0
  • Accelerate: 0.27.2
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
23
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mertcobanov/mpnet-base-all-nli-triplet-turkish-v4-dgx

Finetuned
(49)
this model

Dataset used to train mertcobanov/mpnet-base-all-nli-triplet-turkish-v4-dgx

Evaluation results