MLX quants
Thanks to @awni for converting it to MLX!
It runs over 30 TPS on M2 ultra 192gb!
https://huggingface.co/mlx-community/Hunyuan-A52B-Instruct-3bit
@ehartford how'd you end up getting this into safetensor format? Been having a hell of a time (or am just holiday-tired) with this finetune of it, and assume whatever you did may work the same. I keep getting invalid header errors when trying to convert it (also is missing a index json) using the safetensors convert script. Assume something MoE related?
ERROR - Error converting pytorch_model-00003-of-00080.bin: Error while deserializing header: HeaderTooLarge
https://huggingface.co/tencent/HunyuanVideo-PromptRewrite
Thanks! :)
You are fine tuning this one?
https://huggingface.co/tencent-community/Hunyuan-A52B-Instruct
You are fine tuning this one?
https://huggingface.co/tencent-community/Hunyuan-A52B-Instruct
nope, no finetuning - just trying to get the other tencent HY large model into safetensors and then (ideally) mlx quant'd.
- been trying to work backwards on the data/schema format for the hy video model, which (according to docs and code in the hyvideo repo) uses that prompt rewriter model as a way to standardize input prompts. my assumption is that the llm/llava dataset captions for the model are standardized using that model - otherwise not much point in it.
Tencent/HunyuanVideo-PromptRewrite is a finetune of hunyuan large to my understanding. So I'm assuming same model / architecture / everything just a finetune of hyvideo prompts on top.
ideally the next step then is MLX-ify it
Sorry if unclear.
fwiw i'd been using this to safetensor it: https://github.com/huggingface/safetensors/blob/main/bindings/python/convert.py
which gives the header error. which i assume is something stupid simple.
I did something like this:
model = AutoModelForCausalLM.from_pretrained(
"/workspace/models/Tencent-Hunyuan-Large/Hunyuan-A52B-Instruct",
trust_remote_code=True
)
model.save_pretrained(
"/workspace/models/Hunyuan-A52B-Instruct",
safe_serialization=True
)
thx eric, this helped (so far so good anyway). rebuilding the missing pytorch index json based on it now - thank you! and happy new year man!
in case it helps anyone:
import os
import json
import torch
from collections import OrderedDict
def create_model_index(model_dir):
"""Create an index mapping parameters to their weight shards."""
# Find all the model shard files
shard_files = [f for f in os.listdir(model_dir) if f.startswith('pytorch_model-') and f.endswith('.bin')]
# Create weight map
weight_map = {}
metadata = {"total_size": 0}
for shard_file in sorted(shard_files):
print(f"Processing {shard_file}...")
shard_path = os.path.join(model_dir, shard_file)
state_dict = torch.load(shard_path, map_location='cpu')
# Record which parameters are in this shard
for param_name in state_dict.keys():
weight_map[param_name] = shard_file
metadata["total_size"] += state_dict[param_name].numel() * state_dict[param_name].element_size()
# Create the index dictionary
index = {
"metadata": metadata,
"weight_map": weight_map
}
# Save the index file
with open(os.path.join(model_dir, "pytorch_model.bin.index.json"), "w") as f:
json.dump(index, f, indent=2)
print(f"Created index file mapping {len(weight_map)} parameters")
return index
# Use the model directory containing the sharded .bin files
model_dir = "/workspace/models/hunyuan-large-pr"
index = create_model_index(model_dir)