sahajBERT News Article Classification

Model description

sahajBERT fine-tuned for news article classification using the sna.bn split of IndicGlue.

The model is trained for classifying articles into 5 different classes:

Label id Label
0 kolkata
1 state
2 national
3 sports
4 entertainment
5 international

Intended uses & limitations

How to use

You can use this model directly with a pipeline for Sequence Classification:

from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast

# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC")

# Initialize model
model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC")

# Initialize pipeline
pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model)

raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)

Limitations and bias

WIP

Training data

The model was initialized with pre-trained weights of sahajBERT at step 19519 and trained on the sna.bn split of IndicGlue.

Training procedure

Coming soon!

Eval results

Loss: 0.2477145493030548

Accuracy: 0.926293408929837

Macro F1: 0.9079785326650756

Recall: 0.926293408929837

Weighted F1: 0.9266428029354202

Macro Precision: 0.9109938492260489

Micro Precision: 0.926293408929837

Weighted Precision: 0.9288535478995414

Macro Recall: 0.9069095007692186

Micro Recall: 0.926293408929837

Weighted Recall: 0.926293408929837

BibTeX entry and citation info

Coming soon!

Downloads last month
15
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.