We introduce BERTurk-Legal which is a transformer-based language model to retrieve prior legal cases. BERTurk-Legal is pre-trained on a dataset from the Turkish legal domain. This dataset does not contain any labels related to the prior court case retrieval task. Masked language modeling is used to train BERTurk-Legal in a self-supervised manner. With zero-shot classification, BERTurk-Legal provides state-of-the-art results on the dataset consisting of legal cases of the Court of Cassation of Turkey. The results of the experiments show the necessity of developing language models specific to the Turkish law domain. Details of BERTurk-Legal can be found in the paper mentioned in the Citation section below.

Test dataset can be accessed from the following link: https://github.com/koc-lab/yargitay_retrieval_dataset

The model can be loaded and used to create document embeddings as follows. Then, the document embeddings can be utilized for retrieval.

from transformers import AutoModelForSequenceClassification, AutoTokenizer

bert_model = "KocLab-Bilkent/BERTurk-Legal"

model = AutoModelForSequenceClassification.from_pretrained(bert_model, output_hidden_states=True)
tokenizer = AutoTokenizer.from_pretrained(bert_model)

tokens = tokenizer("Örnek metin") # a dummy text is provided as input

output = model(tokens) 
docEmbeddings = output.hidden_states[-1]

Citation

If you use the model, please cite the following conference paper.

  @inproceedings{ozturk23berturkLegal,
    author={\"{O}zt\"{u}rk, Ceyhun E. and \"{O}z\c{c}elik, {\c{S}}. Bar{\i}\c{s} and Aykut Ko\c{c}},
    booktitle={2023 31st Signal Processing and Communications Applications Conference (SIU)}, 
    title={{A Transformer-Based Prior Legal Case Retrieval Method}}, 
    year={2023},
    volume={},
    number={},
    pages={1-4}
  }
  @mastersthesis{ozturk23legalNlp,
  author  = "\"{O}zt\"{u}rk, Ceyhun E.",
  title   = "Retrieving Turkish Prior Legal Cases with Deep Learning",
  school  = "Bilkent University",
  year    = "2023"
  }
Downloads last month
268
Safetensors
Model size
184M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.