PVT encoder
use of transformer
For example, some works model the vision task as a dictionary lookup problem with learnable queries, and use the Transformer decoder as a task-specific head on top of the CNN backbone.
将视觉任务建模为一个具有可学习查询的字典查询问题,并将Transformer解码器作为CNN主干之上的特定任务头。