Trick or ResNet Treat
A small ๐ treat, I just uploaded a few small ResNets trained like they've never been trained before. A user request, I threw a recent hparam set (MobileNet-v4 Conv Small x ResNet Strikes Back / timm
, ra4
in the tables) at the 'Basic Block' ResNet-18 & 34, including V2 (pre-activation) variants.
The results were good! ResNet-18s at 73-74% and 34s at 77-78%, oh my!
See the table below for some context. I included some past 'best' ResNet results,
- ResNet Strikes Back (
a1
,a1h
) - torchvision 'Batteries Included' Resnets (
tv2
) that followed RSB - O.G. torchvision (
tv
) ResNets in 18-50 range
I did actually train a D
variant ResNet50 w/ similar ra4
hparams, but they didn't improve upon past quite as much, likely need further hparam tweaks, more augreg.
model | img_size | top1 | top5 | param_count |
---|---|---|---|---|
resnet50d.ra4_e3600_r224_in1k | 224 | 80.958 | 95.372 | 25.58 |
resnet50.tv2_in1k | 224 | 80.856 | 95.43 | 25.56 |
resnet50d.a1_in1k | 224 | 80.686 | 94.712 | 25.58 |
resnet50.a1h_in1k | 224 | 80.662 | 95.306 | 25.56 |
resnet50.a1_in1k | 224 | 80.368 | 94.59 | 25.56 |
resnetv2_34d.ra4_e3600_r224_in1k | 224 | 78.268 | 93.956 | 21.82 |
resnetv2_34.ra4_e3600_r224_in1k | 224 | 77.636 | 93.528 | 21.8 |
resnet34.ra4_e3600_r224_in1k | 224 | 77.448 | 93.502 | 21.8 |
resnet34.a1_in1k | 224 | 76.428 | 92.88 | 21.8 |
resnet50.tv_in1k | 224 | 76.128 | 92.858 | 25.56 |
resnetv2_18d.ra4_e3600_r224_in1k | 224 | 74.412 | 91.928 | 11.71 |
resnet18d.ra4_e3600_r224_in1k | 224 | 74.324 | 91.832 | 11.71 |
resnetv2_18.ra4_e3600_r224_in1k | 224 | 73.578 | 91.352 | 11.69 |
resnet34.tv_in1k | 224 | 73.316 | 91.422 | 21.8 |
resnet18.a1_in1k | 224 | 71.49 | 90.076 | 11.69 |
resnet18.tv_in1k | 224 | 69.758 | 89.074 | 11.69 |
The new weights all scale quite nicely to higher resolutions at inference time. Some points of interest here. The tv2
and a1h
ResNet50 were trained at 176x176 resolution and by evaluating at 224x224 they were attempting to hit the 'peak' in the train-test resolution discrepancy (https://arxiv.org/abs/1906.06423). When I was working on the RSB recipe I did not want to sacrifice higher res scaling by trying to bag the peak for 224x224 eval, only the low-cost a3
trained at lower res. You can see that in this 288x288 table, the a1
RSB has more to give, the tv2
is already on the downslope, and a1h
just past peak. These new ra4
are res scaling champs and have a bit more to give.
model | img_size | top1 | top5 | param_count |
---|---|---|---|---|
resnet50d.ra4_e3600_r224_in1k | 288 | 81.812 | 95.91 | 25.58 |
resnet50d.a1_in1k | 288 | 81.45 | 95.216 | 25.58 |
resnet50.a1_in1k | 288 | 81.232 | 95.108 | 25.56 |
resnet50.a1h_in1k | 288 | 80.914 | 95.516 | 25.56 |
resnet50.tv2_in1k | 288 | 80.87 | 95.646 | 25.56 |
resnetv2_34d.ra4_e3600_r224_in1k | 288 | 79.59 | 94.77 | 21.82 |
resnetv2_34.ra4_e3600_r224_in1k | 288 | 79.072 | 94.566 | 21.8 |
resnet34.ra4_e3600_r224_in1k | 288 | 78.952 | 94.45 | 21.8 |
resnet34.a1_in1k | 288 | 77.91 | 93.768 | 21.8 |
resnet50.tv_in1k | 288 | 77.252 | 93.606 | 25.56 |
resnetv2_18d.ra4_e3600_r224_in1k | 288 | 76.044 | 93.02 | 11.71 |
resnet18d.ra4_e3600_r224_in1k | 288 | 76.024 | 92.78 | 11.71 |
resnetv2_18.ra4_e3600_r224_in1k | 288 | 75.34 | 92.678 | 11.69 |
resnet34.tv_in1k | 288 | 74.8 | 92.356 | 21.8 |
resnet18.a1_in1k | 288 | 73.152 | 91.036 | 11.69 |
resnet18.tv_in1k | 288 | 71.274 | 90.244 | 11.69 |