GAN Compression: Efficient Architectures for Interactive Conditional GANs

Background

Conditional Generated Adversarial Network (CGAN) provides controlled image synthesis for many computer vision and graphics applications. However, the computational strength of CGAN is 1-2 orders of magnitude higher than that of CNN used for recognition.

Contributions

The author transfers the knowledge from the teacher generator to student generator and find it beneficial to create pseudo pairs using the teacher model’s output for unpaired training. This transforms the unpaired learning to a paired learning.
The author uses neural architecture search (NAS) to automatically find an efficient network with significantly fewer computation costs and parameters.

Method

As shown in equation 1, the author unifies the unpaired and paired learning in the model

compression setting. Since CGANs usually output a deterministic image, rather than a probabilistic distribution. Therefore, it is difficult to distill the dark knowledge from the teacher’s output pixels. The author use equation 3 L2 loss as the distill loss for the regression in different stages.

To reducing the number of channels, the author automatically first selects the channel width in the generators using channel pruning [23, 22, 44, 81, 47] to remove the redundancy. For each convolution layers, the number of channels is multiples of 8. A straight-forward approach is to traverse all the possible channel configuration, train it to find which is the best. However, to this approach, if we have 8 layers and which layers have 4 channel candidates {8,16,32,64}, so we need to train the eighth power of four models, which is very time-consuming. Hence, the author trains a “once-for-all” network [7] that supports different channel numbers. Each sub-network with different numbers of channels are equally trained and can operate independently. Sub-networks share the weights with the “once-for-all” network. In this case, they can only need to train once, but they can evaluate all the possible channel configurations without further training,

Experiments

With the fast GAN compression proposed by authors, we can find this method can largely reduce the training time.

As shown in Table 1, by using the best performing subnetwork from the “once-for-all” network, GAN Compression not only reduce the model size but also remain the performance.

In order to test how the GAN compression perform in practical devices, they measure the inference speed of our compressed models on several devices with different computational powers. As shown in Table 2, the proposed method performs well on CPU devices.

Strength

Overall, this paper proposed a compression framework for reducing the computational cost and model size of generators in CGANs and use knowledge distillation and neural architecture search to alleviate the training instability and increase the model efficiency. Extensive experiments have shown that this method can compress several CGAN models while preserving the visual quality.

Weakness

To be honest, the proposed method is interesting and can be used in a lot of image to image tasks, like image super-resolution. The extensive experiments also demonstrate the effectiveness of the proposed method. I think the only weakness is the description of the part of NAS and Decouple Training and Search is so vague that I take a long time to capture the author’s ideas.