`aux_loss_alpha` should be 1e-4 instead of 1e-3?
Browse filesAccording to DeepSeekV3 technical report section 4.2
> For the balance loss, we set 𝛼 to 0.0001
- config.json +1 -1
config.json
CHANGED
@@ -9,7 +9,7 @@
|
|
9 |
"AutoModel": "modeling_deepseek.DeepseekV3Model",
|
10 |
"AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
|
11 |
},
|
12 |
-
"aux_loss_alpha": 0.
|
13 |
"bos_token_id": 0,
|
14 |
"eos_token_id": 1,
|
15 |
"ep_size": 1,
|
|
|
9 |
"AutoModel": "modeling_deepseek.DeepseekV3Model",
|
10 |
"AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
|
11 |
},
|
12 |
+
"aux_loss_alpha": 0.0001,
|
13 |
"bos_token_id": 0,
|
14 |
"eos_token_id": 1,
|
15 |
"ep_size": 1,
|