Hi,
The paper provided to arxiv details very little of the configuration on the Swin Transformer V2 that was used.
It is clear that 48 layers were used with a input of 70, 90, 190, but the details regarding window size, number of heads, and depths are absent.
I think that these parameters would help to understand the behaviour of the model, and put the impressive results had in further context.
Would it be possible to detail what the Swin v2 configuration was?
Hi,
The paper provided to arxiv details very little of the configuration on the Swin Transformer V2 that was used.
It is clear that 48 layers were used with a input of
70, 90, 190, but the details regarding window size, number of heads, and depths are absent.I think that these parameters would help to understand the behaviour of the model, and put the impressive results had in further context.
Would it be possible to detail what the Swin v2 configuration was?