Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM

#62

by Bobcuicui - opened about 9 hours ago

about 9 hours ago

I use AutoModelForCausalLM.from_pretrained to load DeepSeek_V3, it raises below warning:

then I print the model state keys, it only has 60 layers, however, the deepseek v3 weight actual has 61 layers, the last layer is missing.
model = AutoModelForCausalLM.from_pretrained(
"path/to/deepseek_v3_bf16,
device_map="cpu",
torch_dtype="auto",
trust_remote_code=True,)
print(model.state_dict().keys())

How to fix it? Thank you~

cassanof

about 6 hours ago

the 61st layer is the MPT layer, not actually part of the model

Bobcuicui

about 2 hours ago

ok, I got it, Thank you~ @cassanof

Bobcuicui changed discussion status to closed about 2 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment