Request for Mistral Large 2 Instruct 2407 3bit with Autoround GPTQ

by MLDataScientist - opened 2 days ago

2 days ago

Hi @kaitchup,

Thank you for sharing this quant. Is it possible for you to quantize Mistral Large 2 with autoround GPTQ? I searched for many 3 bit quantization types and autoround stood out because it can convert the quantized model into GPTQ format. I use 2xMI60 AMD GPUs with vllm. GPTQ is the fastest performing format for these GPUs. I have total of 64GB VRAM. 4bit is over 65GB for other Mistral Large 2 GPTQ models available in HF. However, I could not find 3 bit version of Mistral Large 2 with GPTQ format. Since others also will be looking for it, can you please make 3 bit version of Mistral Large 2 with autoround GPTQ? Here is the model link: https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
Thank you very much!

bnjmnmarie

The Kaitchup org 2 days ago

•

edited 2 days ago

Hi!
I would love to but I'm not allowed to do this because of the very limited license, especially this point:
"I understand that if I am a commercial entity, I am not permitted to use or distribute the model internally or externally, or expose it in my own offerings without a commercial license"

The Kaitchup is a commercial entity.

MLDataScientist

2 days ago

ah ok, thanks!

Can you quantize and share it from your personal HF account then? I am also going to use it locally obviously, not for commercial purposes. There are so many quantized versions of Mistral Large 2 offered by personal accounts in HF but none of them is in 3bit GPTQ format.
Thanks!

MLDataScientist

2 days ago

oh, also one more question. Do you know why GPTQ 4 bit version of Mistral Large 2 is bigger than exllamav2 4bpw version?
e.g. 4bpw here loads and runs fine on my 64 GB VRAM setup since the model is 62GB: https://huggingface.co/turboderp/Mistral-Large-Instruct-2407-123B-exl2/tree/4.0bpw
But 4bit GPTQ version is 65GB: https://huggingface.co/TechxGenus/Mistral-Large-Instruct-2407-GPTQ/tree/main
I wish GPTQ 4bit version was also 62GB since GPTQ autoround has better perplexity than exllamav2 4bpw version. But I don't know why they have such a difference.

Thanks!

MLDataScientist

about 22 hours ago

@bnjmnmarie please let me know. Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment