This model is a fine-tuned version of the BERT language model, specifically adapted for multi-label classification tasks in the financial regulatory domain. It is built upon the pre-trained ProsusAI/finbert model, which has been further fine-tuned using a diverse dataset of financial regulatory texts. This allows the model to accurately classify text into multiple relevant categories simultaneously.

Model Architecture

  • Base Model: BERT
  • Pre-trained Model: ProsusAI/finbert
  • Task: Multi-label classification

Performance

Performance metrics on the validation set:

  • F1 Score: 0.8637
  • ROC AUC: 0.9044
  • Accuracy: 0.6155

Limitations and Ethical Considerations

  • This model's performance may vary depending on the specific nature of the text data and label distribution.
  • Class imbalance in the dataset.

Dataset Information

  • Training Dataset: Number of samples: 6562
  • Validation Dataset: Number of samples: 929
  • Test Dataset: Number of samples: 1884

Training Details

  • Training Strategy: Fine-tuning BERT with a randomly initialized classification head.
  • Optimizer: Adam
  • Learning Rate: 1e-4
  • Batch Size: 16
  • Number of Epochs: 2
  • Evaluation Strategy: Epoch
  • Weight Decay: 0.01
  • Metric for Best Model: F1 Score
Downloads last month
9
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.