# Normalization layer

CommentsWhat is Batch Normalization?

Instance Normalization?

Conditional Batch Normalization?

Conditional Instance Normalization?

**Batch Normalization** is first introduced by Sergey Ioffe, Christian Szegedy. **It increased image classification performance significantly.**

Interested in "Conditional Batch Normalization (CBN)", here's wrap up of `normalization`

layers.

refer to "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization"

Well compared in this paper.

## Batch Normalization (BN)

$$\text{BN}(x)=\gamma (\frac{x-\mu(x)}{\sigma(x)})+\beta$$

$\gamma, \beta \in \mathbb{R}^C$ are affine parameters learned from data;

mean ad standard deviation, computed across **batch size and spatial dimensions indenpedently for each feature channel**

$\mu_c(x)=\frac{1}{NHW}\sum\limits_{n=1}^N \sum\limits_{h=1}^H \sum\limits_{w=1}^W x_{nchw}$

$\sigma_c(x)=\sqrt{\frac{1}{NHW}\sum\limits_{n=1}^N \sum\limits_{h=1}^H \sum\limits_{w=1}^W (x_{nchw}-\mu_c(x))^2+\epsilon}$

## Instance Normalization (IN)

$$\text{IN}(x)=\gamma (\frac{x-\mu(x)}{\sigma(x)})+\beta$$

$\mu_{nc}(x)=\frac{1}{HW} \sum\limits_{h=1}^H \sum\limits_{w=1}^W x_{nchw}$

$\sigma_{nc}(x)=\sqrt{\frac{1}{HW} \sum\limits_{h=1}^H \sum\limits_{w=1}^W (x_{nchw}-\mu_{nc}(x))^2+\epsilon}$

While `BN`

takes average among channels, `IN`

takes average in each channels. Thus, each channels won't be affected. Image generation is very dependent on channels compared to image classification. `IN`

takes very important parts in image generation. State-of-the-art such as CycleGAN, StarGAN ... uses `IN`

instead of `BN`

.

In my opinion,

`BN`

is good for discrminative job and`IN`

for generative job.

Here's difference between `BN`

and `IN`

## Conditional Batch Normalization (CBN)

First instoduced from Modulating early visual processing by language.

in NIPS2017. `CBN`

is introduced from Auron Courville's lab.

Also predict delta value of $\mu$ and $\sigma$ on

`BN`

. Thus, $\mu$ and $\sigma$ of`BN`

will be conditioned to some other Neural Net (questions, query, ...).

## Conditional Instance Normalization (CIN)

$$\text{CIN}(x;s)=\gamma^s (\frac{x-\mu(x)}{\sigma(x)})+\beta^s, s \in {1,2,3,...,S}$$

Surprisingly, the network can generate images in completely different styles by using the same convolutional parameters but different affine parameters in IN layers.

`CIN`

is would be good for conditional image generation (sytle transfer for given style. Compute style and use the $\mu$ and $\sigma$ for image generation.)

Ulyanov et al. [52] attribute the success of `IN`

to its **invariance to the contrast of the content image**. However, `IN`

takes place in the feature space, therefore it should have **more profound impacts than a simple contrast normalization in the pixel space**. Perhaps even more surprising is the fact that the **affine parameters in IN can completely change the style of the output image**.

##### Adaptive Instance Normalization (AdaIN)

`AdaIN`

has no learnable affine parameters. Instead, it adaptively computes the affine parameters from the style input:

$$\text{AdaIN}(x;s)=\sigma(y) (\frac{x-\mu(x)}{\sigma(x)})+\mu(y)$$

in which we simply scale the normalized content input with $\sigma(y)$, and shift it with $\mu(y)$.

In my opinion, we already use the term

`Conditional Normalization`

instead of`Adative Normalization`

.

# Applications

## FiLM: Visual Reasoning with a General Conditioning Layer

by Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

AAAI 2018. Code available at this http URL . Extends arXiv:1707.03017.

This work outperforms Deepmind's "relation network" VQA task.

## Conditional Instance Normalization used from "A LEARNED REPRESENTATION FOR ARTISTIC STYLE"

Vincent Dumoulin & Jonathon Shlens & Manjunath Kudlur, **Google Brain**

Outputs multiple style trasfered output with a single network.

## Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data

by **Amjad Almahairi**, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, **Aaron Courville**

Submitted to ICML2018, arXiv:1802.10151v1

Uses `CIN`

for many-to-many mapping.