/ Computer Vision

BicycleGAN : Image Translation with GAN (5)

Comments

Limitations of pix2pix, DTN, DiscoGAN & CycleGAN?

  • They produce single answer.
  • They are deterministic models.
    • Translates an image in one-to-one

right fit

  • Paired set, One-to-One : pix2pix (CVPR2017)
  • Unpaired set, One-to-One : DTN (ICLR2017), CycleGAN (ICCV2017)
  • Paired set, One-to-Many : ???

BicycleGAN:

Toward Multimodal Image-to-Image Translation (NIPS2017)
BicycleGAN github

inline
fit
inline

Easy approach:

  • Adopt stochastically sampled noise $N(z)$ to the deterministic generator
  • Hope noise act as latent code to produce diverse results

inline
inline

However, it causes mode collapse

inline

  • Generates from multiple noises but mapped to similar outputs
  • Generator do not care noise
  • Generate learns to ignore random noise when conditioned on relevant context (input image)

Encoder

  • Encourage bijection between the output <-> latent space
  • Disturb two different latent codes to generate same output.
    • Avoid mode collapse

inline

Conditional Variational Autoencoder-GAN

inline
inline

  • Encoder predicts Gaussian
  • Encoding is trained with real data (B)
  • Generator takes latent code with rich info of $B$ and input $A$.
  • At test time, generated from random latent code may produce unrealistic image.
  • Generator never see random noise.
  • Discriminator never see samples from generated from random noise

Conditional Latent Regressor GAN

inline
inline

  • Encoder is latent code regressor.
  • Generated sample is encoded and mapped back to random noise.
  • Latent code is easily and randomly sampled, as test time.
  • Generator never sees ground truth $B$.
  • More vulnerable to mode collapse, probably small dimension of $z$ and $L_1$ loss on $z$ and $\hat{z}$ is not enough to prevent generator easily fool discriminator?

BicycleGAN

inline

Train both model together, with benefit of cycle-loss in $z$ and $B$.

Result

  • Pix2pix + noise : similar realistic outputs
  • cVAE-GAN : adds variation but artifacts caused from random sample at test
  • cLR-GAN : less variant in output and sometimes mode collapse
  • BicycleGAN : hybrid results both diverse and realistic

inline

Quantative experiment

inline

Conclusion

  • Propose solution to mode collapse in the conditional generative setting
  • Combine multiple objectives for encouraging a bijective mapping between the latent and output spaces
  • Produce both realistic and diverse
  • Latent code could be replaced with user controllable parameter in the future
  • Paired set, One-to-One : pix2pix (CVPR2017)
  • Unpaired set, One-to-One : DTN (ICLR2017), DiscoGAN (ICML2017), CycleGAN (ICCV2017)
  • Paired set, One-to-Many : BicycleGAN (NIPS2017)
  • In the future:
    • Unpaired set, One-to-Many : Augmented CycleGAN (probabily ICML2018 submitted), XGAN (ICLR2018 rejected)
    • Multi domains. Not only a source domain to a target domain: StarGAN (CVPR2018 accepted)
    • User controllable noise vector in BicycleGAN
Junho Cho

Junho Cho

Integrated Ph.D course and Interested in Computer Vision, Deep Learning. For more information, tmmse.xyz/junhocho/

Read More