Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM
#62
by
Bobcuicui
- opened
I use AutoModelForCausalLM.from_pretrained to load DeepSeek_V3, it raises below warning:
then I print the model state keys, it only has 60 layers, however, the deepseek v3 weight actual has 61 layers, the last layer is missing.
model = AutoModelForCausalLM.from_pretrained(
"path/to/deepseek_v3_bf16,
device_map="cpu",
torch_dtype="auto",
trust_remote_code=True,)
print(model.state_dict().keys())
How to fix it? Thank you~
the 61st layer is the MPT layer, not actually part of the model
Bobcuicui
changed discussion status to
closed