Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM

#62
by Bobcuicui - opened

I use AutoModelForCausalLM.from_pretrained to load DeepSeek_V3, it raises below warning:

image.png

then I print the model state keys, it only has 60 layers, however, the deepseek v3 weight actual has 61 layers, the last layer is missing.
model = AutoModelForCausalLM.from_pretrained(
"path/to/deepseek_v3_bf16,
device_map="cpu",
torch_dtype="auto",
trust_remote_code=True,)
print(model.state_dict().keys())

image.png

How to fix it? Thank you~

the 61st layer is the MPT layer, not actually part of the model

ok, I got it, Thank you~ @cassanof

Bobcuicui changed discussion status to closed

Sign up or log in to comment