Updated precision to bfloat16 and use_chat_template to false for pankajmathur/orca_mini_v8_0_70b and pankajmathur/orca_mini_v8_1_70b

#1042
by pankajmathur - opened

Hi @alozowski and Team,

First of all, Great work on new UI of Open LLM LB, It looks stunning.
I submitted 2 of the new series of Orca_Mini_v8_* models fine tuned on Llama-3.3-70B-Instruct for evaluation via UI but initially used wrong precision and chat_template flag.
Now, I have opened 2 MR for these 2 models to fix these mistakes, could you please have a look and Let me know, if you need additional details on this:

  1. https://huggingface.co/datasets/open-llm-leaderboard/requests/discussions/74/
  2. https://huggingface.co/datasets/open-llm-leaderboard/requests/discussions/75/

Regards,
Pankaj

Open LLM Leaderboard org

Hi @pankajmathur ,

Thanks for opening the issue! I corrected both of your requests manually, it should be fine now

I'm closing this discussion, feel free to open a new one in case of any questions

alozowski changed discussion status to closed

Thank You for swift turnaround, appreciated.

Hi @alozowski ,

Happy Monday, just reaching out to make sense out of following eval requests commits for model "pankajmathur/orca_mini_v8_0_70b", the below commit shows file rename and changes from wrong "params": 35.277,
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/5660c4c4b9156fa0f15d99be7eee061d5de24764#d2h-741276
Does the model failed to evaluate and these changes reflect re submission for evaluation again?

If it is true, can we submit "pankajmathur/orca_mini_v8_1_70b" again too, It shows it is failed too?
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/8b40ba212c48dc470be4f661b67cc085ed456477#d2h-702908

Is there any reason they are failing? Just for background, I have successfully evaluated both of them on my own servers, before submitting them to HF Open LLM LB, using:
https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#reproducibility

lm_eval --model hf --model_args pretrained=pankajmathur/orca_mini_v8_1_70b,dtype=bfloat16,parallelize=True --tasks leaderboard --output_path lm_eval_results/leaderboard --batch_size auto

and they are updated for both model cards:
https://huggingface.co/pankajmathur/orca_mini_v8_0_70b
https://huggingface.co/pankajmathur/orca_mini_v8_1_70b

Again, thanks again for helping out on this really appreciated.

Regards,
Pankaj

alozowski changed discussion status to open
Open LLM Leaderboard org

Hi @pankajmathur ,

Apologies for the delayed response, it was the New Year holidays, but I’m glad to get back to discussions!

Does the model failed to evaluate and these changes reflect re submission for evaluation again?

Yes, unfortunately, the model failed due to an incorrect detection of the number of parameters. On the next attempt, it failed again due to a network error. I’ve resubmitted it and I hope it will work fine this time

Closing this discussion again, feel free to ping me in case of any questions about this model or please open a new discussion!

alozowski changed discussion status to closed

No Worries at all, Happy New year to you too :)

I saw "https://huggingface.co/pankajmathur/orca_mini_v8_1_70b" under queue so thank you! I think we need to open the thread again, as "https://huggingface.co/pankajmathur/orca_mini_v8_0_70b" was also failed so Could we please resubmit that too for the queue? please close this thread once it is submitted too.

Thanks again for all the cool work on LB Filters, its is becoming my go to tool for various LLM analysis.

Regards,
Pankaj

Sign up or log in to comment