A deep and narrow Mistral model (950M params)
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [blog post]. It is meant for edge applications.
It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.
LM Harness numbers:
Benchmark | Result |
---|---|
arc_c | 0.2884 |
arc_e | 0.5139 |
boolq | 0.6089 |
hellaswag | 0.5888 |
obqa | 0.3280 |
piqa | 0.7388 |
siqa | 0.4038 |
wino | 0.5627 |
- Downloads last month
- 15
Inference API (serverless) does not yet support PyTorch models for this pipeline type.
Model tree for cckm/tinymistral_950m
Unable to build the model tree, the base model loops to the model itself. Learn more.