14B model detected as 7B

#1049
by djuna - opened

I've been working on merging a 14 billion parameter model recently, but when it comes time to evaluate the model, the system indicates that the model has only 7 billion parameter instead of the expected 14 billion. It's funny when the top 7 billion model is actually 14 billion

Open LLM Leaderboard org

Hi @djuna ,

Could you please provide the request file for the model you submitted so we will be able to check the number of parameters?

When you filter 7-8B size on the spaces, more than 10+ model is actually 14B

There are quite a few models in the leaderboard where the indicated size is half the actual size:

  • maldv/Qwentile2.5-32B-Instruct
  • CultriX/Qwen2.5-14B-Wernickev3

...and many others, most of them Qwen-derived.

Open LLM Leaderboard org

Hi! Thanks for the report!
We extract the number of parameters from the safetensors files automatically, in theory - @alozowski will be able to investigate why there is a mismatch when she comes back from vacations

Interestingly, I'm seeing the same thing. Djuna's models and my own are merging similar base models. Also, the scores for my models on the leaderboard vary greatly from what the comparator shows for them. I believe the comparator is accurate.

Vimarckoso-Comparator.png

Open LLM Leaderboard org

For the difference between the comparator and leaderboard, make sure you compare either raw or normalised scores on both (we have 2 ways to compute scores, it should be explained in the FAQ)

there are some models by sometimesanotion
all of those models are deleted/unavailable

some of the request files:
Qwen2.5-14B-Vimarckoso-v3-model_stock
Lamarck-14B-v0.6-model_stock
Qwen2.5-14B-Vimarckoso-v3-Prose01
Qwentinuum-14B-v5

and there are a dozen or so more similar

Would a way to: automatically flag a model for closer/manual inspection, when its model name and auto-detected size differ significantly - would that help much?

Open LLM Leaderboard org

Hi everyone!

Thank you all for bringing this to our attention. The parameter numbers issue was fixed last week. If a model was accessible on Hub, its parameters were recalculated to the correct value, and I hope there are no more such errors

I'm closing this issue, but please feel free to ping me here if you have any other questions or open a new one!

alozowski changed discussion status to closed

@alozowski , unfortunately the following models (none of them available on the hub), still have the wrong size:

 [1] "sometimesanotion/Lamarck-14B-v0.6-002-model_stock"      "sometimesanotion/Lamarck-14B-v0.6-model_stock"         
 [3] "sometimesanotion/Qwen-14B-ProseStock-v4"                "sometimesanotion/Qwen2.5-14B-Vimarckoso-v2"            
 [5] "sometimesanotion/Qwen2.5-14B-Vimarckoso-v3-IF-Variant"  "sometimesanotion/Qwen2.5-14B-Vimarckoso-v3-Prose01"    
 [7] "sometimesanotion/Qwen2.5-14B-Vimarckoso-v3-model_stock" "sometimesanotion/Qwentinuum-14B-v013"                  
 [9] "sometimesanotion/Qwentinuum-14B-v1"                     "sometimesanotion/Qwentinuum-14B-v2"                    
[11] "sometimesanotion/Qwentinuum-14B-v3"                     "sometimesanotion/Qwentinuum-14B-v5"                    
[13] "sometimesanotion/Qwentinuum-14B-v6"                     "sometimesanotion/Qwentinuum-14B-v6-Prose"              
[15] "sometimesanotion/Qwentinuum-14B-v7"                     "sometimesanotion/Qwentinuum-14B-v8"                    
[17] "sometimesanotion/Qwentinuum-14B-v9"                     "sometimesanotion/Qwenvergence-14B-qv256"               
[19] "sometimesanotion/Qwenvergence-14B-v0.6-004-model_stock" "sometimesanotion/Qwenvergence-14B-v3"                  
[21] "sometimesanotion/Qwenvergence-14B-v3-Reason"            "sometimesanotion/Qwenvergence-14B-v3-Reason"           
[23] "sometimesanotion/Qwenvergence-14B-v6-Prose"  

Also this one: sometimesanotion/IF-reasoning-experiment-40, so at least 24 in total, all 14B indicated as 7b. Looks like unsloth/phi-4-unsloth-bnb-4bit too, I don't think phi-4 has 8b params.

When the leaderboard is filtered by size (-1 to 10: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?params=-1%2C10), these swamp the top places which is rather unfortunate, as it makes the rankings unusable. (The actual best model with <10b params is tiiuae/Falcon3-7B-Instruct, and it's not in the top 20.) For the integrity of the list, would it be possible to manually fix these at least to the approximately correct values?

Open LLM Leaderboard org

Hi @brankor-mcom ,

Thanks for the list of models! I corrected them manually, and we will check the parameter correctness for the inaccessible models.

For unsloth/phi-4-unsloth-bnb-4bit – everythings is correct

Sign up or log in to comment