U-Net

The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.

该架构由一个捕捉上下文的收缩路径和一个实现精确定位的对称扩展路径组成

Hence, Ciresan et al.trained a network in a sliding-window setup to predict the class label of each pixel by providing a local region (patch) around that pixel as input.

This network can localize.

Two drawbacks: slow and redundant ,trade-off between localization accuracy and the use of context.

The main idea in fully convolutional network is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators.

全卷积网络的主要思想是通过连续的层来补充通常的收缩网络，其中池化算子被上采样算子所取代。

FCN的主要工作是将传统CNN后面的全连接层换成了卷积层，这样网络的输出将是热力图而非类别；同时，为解决卷积和池化导致图像尺寸的变小，使用上采样方式对图像尺寸进行恢复，并且提出一个跳级结构，让pool4和pool3之后的特征图于conv7之后的特征图相加，得到新的更加准确的分割图像。

这些layers增加了输出的分辨率。为了定位，通过收缩路径(conracting path)的高分辨率特征图和上采样的输出相结合，之后的连续卷积层可以来通过这些信息来学习更加精准的输出。

对于上采样保留了很多的特征通道，这个网络没有全连接层，只有卷积层。

只包含卷积就保证了seamless segmentation of arbitrarily large images by an overlap-tile strategy。通过重叠瓦片技术对任意大图形进行无缝分割。通过mirroring the input image.

To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images, since otherwise the resolution would be limited by the GPU memory.

针对于数据增广，文章使用了弹性变换的方式，使得网络能够学习对这种变形的不变性，而不需要在注释的图像语料库中看到这些转换。

对同类型的细胞的分离是很打的一个挑战。文章使用了weighted loss，针对不同地区进行不同的loss学习，让网络对边缘(border)进行学习，达到分离的效果。

网络结构：

左边：contracting path(收缩路径) 右边：expansive path(扩展路径)

Contracting path :重复使用2个没有padding的33卷积，每个卷积之后包含ReLU和一个22最大池化（stride=2），同时，在每次下采样的过程中将通道数*2.

Expansive path : 22卷积进行上采样同时将通道数/2，与收缩路径中相应裁剪的特征图相连接，两个33卷积每个都跟着一个ReLU。Cropping是重要的因为在卷积操作中会造成边缘像素的丢失。

最后一层有一个1*1的卷积将64个特征向量映射到每个所期待的类别。

总共有23层。

为了允许输出分割图的无缝平铺（见图2），重要的是选择输入平铺的大小，使所有2x2的最大池化操作都应用于一个具有偶数x和y大小的层。（换句话说，contracting path中经过卷积和ReLU之后的图像的x，y都是偶数）

训练：

To minimize the overhead and make maximum use of the GPU memory, we favor large input tiles over a large batch size and hence reduce the batch to a single image.

为了最大限度地减少开销和最大限度地利用GPU内存，我们倾向于使用大的输入瓦片而不是大的批处理量，从而将批处理量减少到单一图像。（换句话说就是图像大，batch size小）

Accordingly we use a high momentum (0.99) such that a large number of the previously seen training samples determine the update in the current optimization step.

因此，我们使用一个高动量（0.99），这样大量的先前看到的训练样本决定了当前优化步骤中的更新。

能量函数(energy function)一开始在热力学中被定义，用于描述系统的能量值，当能量值达到最小时系统达到稳定状态。（换句话说就是优化函数）

energy function： pixel-wise soft-max + Cross entropy

Weighted map：

需要有一个好的初始化a good initialization of the weights is extremely important

Ideally the initial weights should be adapted such that each feature map in the network has approximately unit variance. 理想情况下，初始权重应该被调整，使网络中的每个特征图具有近似的单位方差。

可以通过一个标准差为的高斯分布得到。

Data augmentation

random elastic deformations is key concept..

We generate smooth deformations using random displacement vectors on a coarse 3 by 3 grid. The displacements are sampled from a Gaussian distribution with 10 pixels standard deviation. Per-pixel displacements are then computed using bicubic interpolation. Drop-out layers at the end of the contracting path perform further implicit data augmentation

我们在一个粗略的3乘3的网格上使用随机位移矢量生成平滑的变形。位移是从标准偏差为10像素的高斯分布中采样的。然后使用双三次插值计算每像素的位移。在收缩路径末端的剔除层进行进一步的隐性数据增强