PVT encoder

use of transformer

For example, some works model the vision task as a dictionary lookup problem with learnable queries, and use the Transformer decoder as a task-specific head on top of the CNN backbone.

将视觉任务建模为一个具有可学习查询的字典查询问题，并将Transformer解码器作为CNN主干之上的特定任务头。

ViT

拥有圆柱形的粗略images patches as input

shortcomings：

输出特征图是单尺度而且底分辨率
计算花销大

PVT

a pure transformer backbone

advantages:

细粒度(fine-grained)的图像patches 来学习高分辨率表征
引入一个逐渐减小的金字塔transformer的序列长度，减少计算量
引入空间降低注意力层(spatial-reduce attention layer)降低在学习高分辨率特征图的开销
产生全局感受野(global reception field)

Feature Pyramid for Transformer

首先将输入特征图分为 $\frac{H_i W_i}{P_i^2}$ 个patches，将每个patches展平之后并且将其映射到(projected to) $C_i$ 维度的embedding，再经过一个线性映射之后，embedded patches可以被视为 $\frac{H_i-1}{P_i}\times \frac{W_i-1}{P_i} \times C_i$

SRA ？

和MHA(multi-head attention)相似，receive Q K V,它在进行注意力机制之前减少了K和V的空间尺度，显著降低了计算和存贮

这里和原始transformer一样，只不过K和V先经过了一个空间降低操作。

hyper parameters:

: patch size of Stage

: channel number of the output of Stage

: encoder layers number in Stage

: reduction ratio of the SRA in Stage

: head number of the SRA in Stage

: expansion ratio of the feed-forward layer in Stage

the rule of ResNet:

use small output channel numbers in shallow stages
concentrate the major computation resource in intermediate stages.
with the growth of network depth, the hidden dimension gradually increases, and the output resolution progressively shrinks
the major computation resource is concentrated in Stage 3

the advantages over ViT:

more flexible—can generate feature maps of different scales/channels in different stages
more versatile(通用)—can be easily plugged and played in most downstream task models
more friendly to computation/memory—can handle higher resolution feature maps or longer sequences

Reference:

全网最通俗易懂的 Self-Attention自注意力机制讲解_self attention机制-CSDN博客

狗都能看懂的Self-Attention讲解-CSDN博客

https://www.bilibili.com/video/BV15v411W78M/?spm_id_from=333.788

Pyramid Vision Transformer（PVT）: 纯Transformer设计，用于密集预测的通用backbone-CSDN博客

从零开始了解transformer的机制|第四章：FFN层的作用-CSDN博客

对Transformer中FeedForward层的理解_feedforward层的作用-CSDN博客