cckm/tinymistral_950m · Hugging Face

A deep and narrow Mistral model (950M params)

This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [blog post]. It is meant for edge applications.

It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.

LM Harness numbers:

Benchmark	Result
arc_c	0.2884
arc_e	0.5139
boolq	0.6089
hellaswag	0.5888
obqa	0.3280
piqa	0.7388
siqa	0.4038
wino	0.5627

cckm
/

tinymistral_950m

A deep and narrow Mistral model (950M params)

Model tree for cckm/tinymistral_950m

Datasets used to train cckm/tinymistral_950m