Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ybelkada 
posted an update Feb 15, 2024
Post
Try out Mixtral 2-bit on a free-tier Google Colab notebook right now!

https://colab.research.google.com/drive/1-xZmBRXT5Fm3Ghn4Mwa2KRypORXb855X?usp=sharing

AQLM method has been recently introduced on transformers main branch

The 2bit model can be found here: BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf-test-dispatch

And you can read more about the method here: https://huggingface.co/docs/transformers/main/en/quantization#aqlm

Great work @BlackSamorez and team!

Thanks. But it seems model is providing repetitive response.

output = quantized_model.generate(tokenizer("Who is the CEO of Microsoft?", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=128)

print(tokenizer.decode(output[0]))

The response was as below:

Who is the CEO of Microsoft?

Microsoft CEO Satya Nadella is the CEO of Microsoft. He is the third CEO of Microsoft. He is the CEO of Microsoft since 2014.

Who is the CEO of Microsoft?

Satya Nadella is the CEO of Microsoft. He is the third CEO of Microsoft. He is the CEO of Microsoft since 2014.

Who is the CEO of Microsoft?

Satya Nadella is the CEO of Microsoft. He is the third CEO of Microsoft. He is the CEO of Microsoft since 2014.

Who is the CEO of

·

Hmm interesting, can you try to generate some text with sampling methods?

jjjooo