Failed orca_mini_v8_* Evaluation

#1051
by pankajmathur - opened

Opening new discussion, as suggested in previous comment on another discussion:

Hi @alozowski ,

Happy Monday, just reaching out to make sense out of following eval requests commits for model "pankajmathur/orca_mini_v8_0_70b", the below commit shows file rename and changes from wrong "params": 35.277,
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/5660c4c4b9156fa0f15d99be7eee061d5de24764#d2h-741276
Does the model failed to evaluate and these changes reflect re submission for evaluation again?

If it is true, can we submit "pankajmathur/orca_mini_v8_1_70b" again too, as It shows it is failed too?
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/8b40ba212c48dc470be4f661b67cc085ed456477#d2h-702908

Is there any reason they are failing? Just for background, I have successfully evaluated both of them on my own servers, before submitting them to HF Open LLM LB, using:

https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#reproducibility

lm_eval --model hf --model_args pretrained=pankajmathur/orca_mini_v8_1_70b,dtype=bfloat16,parallelize=True --tasks leaderboard --output_path lm_eval_results/leaderboard --batch_size auto

and these results are now updated for both model cards:
https://huggingface.co/pankajmathur/orca_mini_v8_0_70b
https://huggingface.co/pankajmathur/orca_mini_v8_1_70b

Again, thanks again for helping out on this really appreciated.

Regards,
Pankaj

Hi @alozowski and Team,
Please any update on this. Mostly Why all of above model evaluation failed and if there is any way to rerun them.

Open LLM Leaderboard org

Hi @pankajmathur ,

Thank you for your patience! I've resubmitted all your models and it should be fine now

I'm closing this discussion, feel free to ping me here in case of any other problems with these models or please open a new one!

alozowski changed discussion status to closed

Sign up or log in to comment