5CD-AI/viso-twhin-bert-large

Overview

We reduce TwHIN-BERT's vocabulary size to 20k on the UIT dataset and continue pretraining for 10 epochs.

Here are the results on 4 downstream tasks on Vietnamese social media texts, including Emotion Recognition(UIT-VSMEC), Hate Speech Detection(UIT-HSD), Spam Reviews Detection(ViSpamReviews), Hate Speech Spans Detection(ViHOS):

Model Avg MF1 Emotion Recognition Hate Speech Detection Spam Reviews Detection Hate Speech Spans Detection
Acc WF1 MF1 Acc WF1 MF1 Acc WF1 MF1 Acc WF1 MF1
viBERT 78.16 61.91 61.98 59.7 85.34 85.01 62.07 89.93 89.79 76.8 90.42 90.45 84.55
vELECTRA 79.23 64.79 64.71 61.95 86.96 86.37 63.95 89.83 89.68 76.23 90.59 90.58 85.12
PhoBERT-Base 79.3 63.49 63.36 61.41 87.12 86.81 65.01 89.83 89.75 76.18 91.32 91.38 85.92
PhoBERT-Large 79.82 64.71 64.66 62.55 87.32 86.98 65.14 90.12 90.03 76.88 91.44 91.46 86.56
ViSoBERT 81.58 68.1 68.37 65.88 88.51 88.31 68.77 90.99 90.92 79.06 91.62 91.57 86.8
visobert-14gb-corpus 82.2 68.69 68.75 66.03 88.79 88.6 69.57 91.02 90.88 77.13 93.69 93.63 89.66
viso-twhin-bert-large 83.87 73.45 73.14 70.99 88.86 88.8 70.81 91.6 91.47 79.07 94.08 93.96 90.22

Usage (HuggingFace Transformers)

Install transformers package:

pip install transformers

Then you can use this model for fill-mask task like this:

from transformers import pipeline

model_path = "5CD-AI/viso-twhin-bert-large"
mask_filler = pipeline("fill-mask", model_path)

mask_filler("đúng nhận sai <mask>", top_k=10)

Fine-tune Configuration

We fine-tune 5CD-AI/viso-twhin-bert-large on 4 downstream tasks with transformer library with the following configuration:

  • train_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • weight_decay: 0.01
  • optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • training_epochs: 30
  • model_max_length: 128
  • metric_for_best_model: wf1
  • strategy: epoch

And different additional configurations for each task:

Emotion Recognition Hate Speech Detection Spam Reviews Detection Hate Speech Spans Detection
- learning_rate: 1e-5 - learning_rate: 5e-6 - learning_rate: 1e-5 - learning_rate: 5e-6
Downloads last month
10
Safetensors
Model size
326M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.