Computer Vision

pix2pix : Image Translation with GAN (2)

Junho Cho

Apr 2, 2018 • 2 min read

Image-to-Image Translation with Conditional Adversarial Networks (pix2pix)

published to CVPR2017 by Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros

fit
inline
inline

Learn pair-wise images of $S$ and $T$ like below

BW & Color image
Street Scene & Label
Facade & Label
Aerial & Map
Day & Night
Edges & Photo

source image $x \in S$, target image (label) $y \in T$ is pair-wise.

thus it is Supervised Learning

Generator of pix2pix

$G(x,z)$ where $x$: image and $z$: noise
left fit
Use U-Net shaped network

known to be powerful at segmentation task
use spatial information from features of bottom layer
use dropout as noise in decoder part

Discriminator of pix2pix

inline

Loss function

inline 40%
$x$: source image, $y$: target image, $z$: noise

Use Adversarial loss and L1 loss

\begin{equation}
\mathcal{L}_{cGAN}(G,D) = \mathbb{E}_{x,y \sim p_{data}(x,y)}[\log D(x,y)] + \mathbb{E}_{x \sim p_{data}(x), z \sim p_z(z)}[\log (1-D(x,G(x,z)))]
\end{equation}

\begin{equation}
\mathcal{L}_{L1}(G) = \mathbb{E}_{x,y \sim p_{data}(x,y),z \sim p_z(z)}[||y-G(x,z)||_1]
\end{equation}

Result

inline
fit
fit
fit
fit
fit
Do demo!
https://affinelayer.com/pixsrv/