Gemma 2 9B Neogenesis ITA
Fine-tuned version of VAGOsolutions/SauerkrautLM-gemma-2-9b-it optimized for better performance in Italian.
- Good model with 9.24 billion parameters
- Supports 8k context length
๐ฎ Usage
Text generation with Transformers
import torch
from transformers import pipeline
model_id="anakin87/gemma-2-9b-neogenesis-ita"
pipe = pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda",
)
messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}]
outputs = pipe(messages, max_new_tokens=500)
print(outputs[0]["generated_text"][1]["content"])
๐ Evaluation Results
The model was submitted and evaluated in the Open Ita LLM Leaderboard, the most popular leaderboard for Italian Language Models.
Model | MMLU_IT | ARC_IT | HELLASWAG_IT | Average |
---|---|---|---|---|
google/gemma-2-9b-it | 65.67 | 55.6 | 68.95 | 63.41 |
VAGOsolutions/SauerkrautLM-gemma-2-9b-it | 65.76 | 61.25 | 72.10 | 66.37 |
anakin87/gemma-2-9b-neogenesis-ita | 65.82 | 61.25 | 73.29 | 66.79 |
These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range.
๐ง Training details
The model was fine-tuned using Hugging Face TRL and applying Direct Preference Optimization.
I adopted a relatively new technique for parameter-efficient learning: Spectrum. The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest. Specifically, training focused on the top 20% most informative layers.
Batch size: 16; learning rate: 1e-6; epochs: 1.
The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM).
For the training code, see the DPO section in this ๐ Kaggle notebook, modified to use a different base model, hyperparameters, and no on-policy data.
๐๏ธ Training data
The model was trained primarily on Italian data, with a small portion of English data included.
For Direct Preference Optimization
- Italian data
- English data
๐ Thanks to the authors for providing these datasets.
๐ก๏ธ Safety
While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model.
- Downloads last month
- 481
Model tree for anakin87/gemma-2-9b-neogenesis-ita
Base model
VAGOsolutions/SauerkrautLM-gemma-2-9b-it