LayerNorm missing in Statedict of DatViT L

by Matagi - opened

So I wanted to test just the Vision encoder on downstream tasks and downloaded the modelfile, than I notices I cant load the model in strict mode because "norms.weight", "norms.bias" are missing from the checkpoint. I am wondering if this is intentional, with 4048 parameters in total missing (beeing reset) I would guess this degrades the performance when using frozen weights for distilation, while they could be relearned when finetuning.

Screenshot 2024-07-09 164803.png

Microsoft org

Sign up or log in to comment