Original Swin-Large and the current code seem to have same window_size across layers, and if their window_size set to be 12, then the shape of all relative_position_index should be [144, 144].
But I found that the provided checkpoint has [36, 36] for encoder.layers.3.blocks.0.attn.relative_position_index.
Am I missing something?