Computer Vision

BicycleGAN : Image Translation with GAN (5)

Junho Cho

Apr 3, 2018 • 3 min read

Limitations of pix2pix, DTN, DiscoGAN & CycleGAN?

They produce single answer.
They are deterministic models.
- Translates an image in one-to-one

right fit

Paired set, One-to-One : pix2pix (CVPR2017)
Unpaired set, One-to-One : DTN (ICLR2017), CycleGAN (ICCV2017)
Paired set, One-to-Many : ???

BicycleGAN:

Toward Multimodal Image-to-Image Translation (NIPS2017)
BicycleGAN github

inline
fit
inline

Easy approach:

Adopt stochastically sampled noise $N(z)$ to the deterministic generator
Hope noise act as latent code to produce diverse results

inline
inline

However, it causes mode collapse

inline

Generates from multiple noises but mapped to similar outputs
Generator do not care noise
Generate learns to ignore random noise when conditioned on relevant context (input image)

Encoder

Encourage bijection between the output <-> latent space
Disturb two different latent codes to generate same output.
- Avoid mode collapse

inline

Conditional Variational Autoencoder-GAN

inline
inline

Encoder predicts Gaussian
Encoding is trained with real data (B)
Generator takes latent code with rich info of $B$ and input $A$.
At test time, generated from random latent code may produce unrealistic image.
Generator never see random noise.
Discriminator never see samples from generated from random noise

Conditional Latent Regressor GAN

inline
inline

Encoder is latent code regressor.
Generated sample is encoded and mapped back to random noise.
Latent code is easily and randomly sampled, as test time.
Generator never sees ground truth $B$.
More vulnerable to mode collapse, probably small dimension of $z$ and $L_1$ loss on $z$ and $\hat{z}$ is not enough to prevent generator easily fool discriminator?

BicycleGAN

inline

Train both model together, with benefit of cycle-loss in $z$ and $B$.

Result

Pix2pix + noise : similar realistic outputs
cVAE-GAN : adds variation but artifacts caused from random sample at test
cLR-GAN : less variant in output and sometimes mode collapse
BicycleGAN : hybrid results both diverse and realistic

inline

Quantative experiment

inline

Conclusion

Propose solution to mode collapse in the conditional generative setting
Combine multiple objectives for encouraging a bijective mapping between the latent and output spaces
Produce both realistic and diverse
Latent code could be replaced with user controllable parameter in the future

Trends

Paired set, One-to-One : pix2pix (CVPR2017)
Unpaired set, One-to-One : DTN (ICLR2017), DiscoGAN (ICML2017), CycleGAN (ICCV2017)
Paired set, One-to-Many : BicycleGAN (NIPS2017)
In the future:
- Unpaired set, One-to-Many : Augmented CycleGAN (probabily ICML2018 submitted), XGAN (ICLR2018 rejected)
- Multi domains. Not only a source domain to a target domain: StarGAN (CVPR2018 accepted)
- User controllable noise vector in BicycleGAN