Taiwan Words Translator 繁體中文台灣化翻譯器 by LLMs

https://github.com/SuJiaKuan/llm_tw_word

The model supports translation that converts text with China words to text with only Taiwan words. Example:

  • Input: 這個軟件的質量真高啊
  • Output: 這個軟體的品質真高啊

This Model

This model is fine-tuned from TinyLlama/TinyLlama-1.1B-Chat-v1.0 (by applying Instruction Finetuning). The dataset is collected from MBZUAI/Bactrian-X and automatically labeled by 繁化姬.

How to use

You can follow the example usage below, or see here to know how to integrate the model into a Python class.

import torch
from transformers import pipeline

SYSTEM_PROMPT = """\
對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。

範例:
Input: ```這個視頻的質量真高啊```
Output: ```這個影片的品質真高啊```\
"""

text_trad = "這個軟件的質量真高啊"

pipeline = pipeline(
    "text-generation",
    model="feabries/TaiwanWordTranslator-v0.1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Input: ```{}```".format(text_trad)
messages = [{
    "role": "system",
    "content": SYSTEM_PROMPT,
}, {
    "role": "user",
    "content": prompt,
}]
input_text = pipeline.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
outputs = pipeline(
    input_text,
    do_sample=False,
    max_new_tokens=2048,
)
print(outputs[0]["generated_text"])
# <|system|>
# 對於輸入內容的中文文字,請將中國用語轉成台灣的用語,其他非中文文字或非中國用語都維持不變。
# 
# 範例:
# Input: ```這個視頻的質量真高啊```
# Output: ```這個影片的品質真高啊```</s>
# <|user|>
# Input: ```這個軟件的質量真高啊```</s>
# <|assistant|>
# Output: ```這個軟體的品質真高啊```
Downloads last month
32
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train feabries/TaiwanWordTranslator-v0.1