A deep and narrow Mistral model (950M params)

This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [blog post]. It is meant for edge applications.

It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.

LM Harness numbers:

Benchmark Result
arc_c 0.2884
arc_e 0.5139
boolq 0.6089
hellaswag 0.5888
obqa 0.3280
piqa 0.7388
siqa 0.4038
wino 0.5627
Downloads last month
15
Safetensors
Model size
955M params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support PyTorch models for this pipeline type.

Model tree for cckm/tinymistral_950m

Unable to build the model tree, the base model loops to the model itself. Learn more.

Datasets used to train cckm/tinymistral_950m