MLX quants

#19
by ehartford - opened

Thanks to @awni for converting it to MLX!

It runs over 30 TPS on M2 ultra 192gb!

https://huggingface.co/mlx-community/Hunyuan-A52B-Instruct-3bit

@ehartford how'd you end up getting this into safetensor format? Been having a hell of a time (or am just holiday-tired) with this finetune of it, and assume whatever you did may work the same. I keep getting invalid header errors when trying to convert it (also is missing a index json) using the safetensors convert script. Assume something MoE related?

ERROR - Error converting pytorch_model-00003-of-00080.bin: Error while deserializing header: HeaderTooLarge

https://huggingface.co/tencent/HunyuanVideo-PromptRewrite

Thanks! :)

You are fine tuning this one?
https://huggingface.co/tencent-community/Hunyuan-A52B-Instruct

nope, no finetuning - just trying to get the other tencent HY large model into safetensors and then (ideally) mlx quant'd.

  • been trying to work backwards on the data/schema format for the hy video model, which (according to docs and code in the hyvideo repo) uses that prompt rewriter model as a way to standardize input prompts. my assumption is that the llm/llava dataset captions for the model are standardized using that model - otherwise not much point in it.

Tencent/HunyuanVideo-PromptRewrite is a finetune of hunyuan large to my understanding. So I'm assuming same model / architecture / everything just a finetune of hyvideo prompts on top.

ideally the next step then is MLX-ify it

Sorry if unclear.

fwiw i'd been using this to safetensor it: https://github.com/huggingface/safetensors/blob/main/bindings/python/convert.py

which gives the header error. which i assume is something stupid simple.

I did something like this:

model = AutoModelForCausalLM.from_pretrained(
    "/workspace/models/Tencent-Hunyuan-Large/Hunyuan-A52B-Instruct",
    trust_remote_code=True
)
model.save_pretrained(
    "/workspace/models/Hunyuan-A52B-Instruct",
    safe_serialization=True 
)

thx eric, this helped (so far so good anyway). rebuilding the missing pytorch index json based on it now - thank you! and happy new year man!

in case it helps anyone:

import os
import json
import torch
from collections import OrderedDict

def create_model_index(model_dir):
    """Create an index mapping parameters to their weight shards."""
    
    # Find all the model shard files
    shard_files = [f for f in os.listdir(model_dir) if f.startswith('pytorch_model-') and f.endswith('.bin')]
    
    # Create weight map
    weight_map = {}
    metadata = {"total_size": 0}
    
    for shard_file in sorted(shard_files):
        print(f"Processing {shard_file}...")
        shard_path = os.path.join(model_dir, shard_file)
        state_dict = torch.load(shard_path, map_location='cpu')
        
        # Record which parameters are in this shard
        for param_name in state_dict.keys():
            weight_map[param_name] = shard_file
            metadata["total_size"] += state_dict[param_name].numel() * state_dict[param_name].element_size()
            
    # Create the index dictionary
    index = {
        "metadata": metadata,
        "weight_map": weight_map
    }
    
    # Save the index file
    with open(os.path.join(model_dir, "pytorch_model.bin.index.json"), "w") as f:
        json.dump(index, f, indent=2)
    
    print(f"Created index file mapping {len(weight_map)} parameters")
    return index

# Use the model directory containing the sharded .bin files
model_dir = "/workspace/models/hunyuan-large-pr"
index = create_model_index(model_dir)

Sign up or log in to comment