`aux_loss_alpha` should be 1e-4 instead of 1e-3?

#61
by cuichenx - opened

According to DeepSeekV3 technical report section 4.2

For the balance loss, we set 𝛼 to 0.0001

Related: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/discussions/60

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment