Spaces:
Running
on
CPU Upgrade
14B model detected as 7B
I've been working on merging a 14 billion parameter model recently, but when it comes time to evaluate the model, the system indicates that the model has only 7 billion parameter instead of the expected 14 billion. It's funny when the top 7 billion model is actually 14 billion
When you filter 7-8B size on the spaces, more than 10+ model is actually 14B
There are quite a few models in the leaderboard where the indicated size is half the actual size:
- maldv/Qwentile2.5-32B-Instruct
- CultriX/Qwen2.5-14B-Wernickev3
...and many others, most of them Qwen-derived.
Hi! Thanks for the report!
We extract the number of parameters from the safetensors files automatically, in theory -
@alozowski
will be able to investigate why there is a mismatch when she comes back from vacations
For the difference between the comparator and leaderboard, make sure you compare either raw or normalised scores on both (we have 2 ways to compute scores, it should be explained in the FAQ)
there are some models by sometimesanotion
all of those models are deleted/unavailable
some of the request files:
Qwen2.5-14B-Vimarckoso-v3-model_stock
Lamarck-14B-v0.6-model_stock
Qwen2.5-14B-Vimarckoso-v3-Prose01
Qwentinuum-14B-v5
and there are a dozen or so more similar
Would a way to: automatically flag a model for closer/manual inspection, when its model name and auto-detected size differ significantly - would that help much?
Hi everyone!
Thank you all for bringing this to our attention. The parameter numbers issue was fixed last week. If a model was accessible on Hub, its parameters were recalculated to the correct value, and I hope there are no more such errors
I'm closing this issue, but please feel free to ping me here if you have any other questions or open a new one!
@alozowski , unfortunately the following models (none of them available on the hub), still have the wrong size:
[1] "sometimesanotion/Lamarck-14B-v0.6-002-model_stock" "sometimesanotion/Lamarck-14B-v0.6-model_stock"
[3] "sometimesanotion/Qwen-14B-ProseStock-v4" "sometimesanotion/Qwen2.5-14B-Vimarckoso-v2"
[5] "sometimesanotion/Qwen2.5-14B-Vimarckoso-v3-IF-Variant" "sometimesanotion/Qwen2.5-14B-Vimarckoso-v3-Prose01"
[7] "sometimesanotion/Qwen2.5-14B-Vimarckoso-v3-model_stock" "sometimesanotion/Qwentinuum-14B-v013"
[9] "sometimesanotion/Qwentinuum-14B-v1" "sometimesanotion/Qwentinuum-14B-v2"
[11] "sometimesanotion/Qwentinuum-14B-v3" "sometimesanotion/Qwentinuum-14B-v5"
[13] "sometimesanotion/Qwentinuum-14B-v6" "sometimesanotion/Qwentinuum-14B-v6-Prose"
[15] "sometimesanotion/Qwentinuum-14B-v7" "sometimesanotion/Qwentinuum-14B-v8"
[17] "sometimesanotion/Qwentinuum-14B-v9" "sometimesanotion/Qwenvergence-14B-qv256"
[19] "sometimesanotion/Qwenvergence-14B-v0.6-004-model_stock" "sometimesanotion/Qwenvergence-14B-v3"
[21] "sometimesanotion/Qwenvergence-14B-v3-Reason" "sometimesanotion/Qwenvergence-14B-v3-Reason"
[23] "sometimesanotion/Qwenvergence-14B-v6-Prose"
Also this one: sometimesanotion/IF-reasoning-experiment-40, so at least 24 in total, all 14B indicated as 7b. Looks like unsloth/phi-4-unsloth-bnb-4bit too, I don't think phi-4 has 8b params.
When the leaderboard is filtered by size (-1 to 10: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?params=-1%2C10), these swamp the top places which is rather unfortunate, as it makes the rankings unusable. (The actual best model with <10b params is tiiuae/Falcon3-7B-Instruct, and it's not in the top 20.) For the integrity of the list, would it be possible to manually fix these at least to the approximately correct values?
Hi @brankor-mcom ,
Thanks for the list of models! I corrected them manually, and we will check the parameter correctness for the inaccessible models.
For unsloth/phi-4-unsloth-bnb-4bit
– everythings is correct