What makes fake images detectable? Understanding properties that generalize

Background

The fake images are generated by models, making it increasingly difficult for a human to distinguish between what is real and what is fake. The aim of this paper is to seek to understand what properties of fake images make them detectable and identify what generalizes across different model architectures, datasets, and variations in training.

Assumption

Across different facial image generators, the author hypothesize that global errors can differ but local errors may transfer: the global facial structure can vary among different generators and datasets, but local patches of a generated face are more stereotyped and may share redundant artifacts.
When training a classifier on real and fake images with subtly different preprocessing pipelines, the classifier can simply learn to detect the differences in preprocessing.

Method

As shown in Fig.1, the author uses the truncated model to obtain model predictions based on a local region of the image, where truncating earlier in the layer sequence results in a smaller receptive field, while truncating after more layers results in a larger receptive field. The experiment with Resnet and Xception as their model backbones. They apply a cross entropy loss to each patch.

As shown in Fig.2, the author preprocesss the images to make real and fake images as similar as possible to isolate fake image artifacts and minimize the possibility of learning differences in preprocessing.

Experiments

Classification via patches.

Left Table 1. demonstrates the effectiveness of truncated models proposed by author.

Right Table 1. demonstrates the truncated models is more robust than full models when the random seed changes.

Left Table 2. demonstrates the truncated models is more robust than full models on different generators.

Right Table 2. demonstrates the truncated models is more robust than full models on different datasets.

The above results show using small receptive fields allows models to ignore global differences between images from different generators and datasets and focus on shared generator artifacts.

What properties of fake images generalize?

As shown in Fig.3, the author shows fake values in blue and real values in red. The average heatmaps highlight predominately hair and background areas, indicating that these are the regions that patch-wise models rely on when classifying images from unseen test sources.

As shown in Fig.4, the author uses a pretrained facial segmentation network to partition each

images into semantic classes. We can find that the fake-image classifier relies on patches such as hair, background, clothing, and mouths to make decisions.

Finetuning the generator.

As shown in Fig.4 and equation 4, the author use equation 4 to finetune the generator and successfully decrease the performance of classifiers from 100% to 60%. The author also trains a second classifier using images from the finetuned generator, which is able to recover in accuracy. The second classifier also was conducted previous experiments, and the results is same.

Facial manipulation.

Following the above experiments, the author train classifiers on FaceForensics++ datasets to show how the classifiers perform on different datasets with the facial manipulation methods.

Strengths

This paper proposed a method truncated models to classify fake images. Expensive experiments have demonstrated the truncated method is better than full models. Besides, this paper use pretrain segmentation network to segment and visualize what characteristic attributes of fake images are easy to classify. To verify whether the finetuned generator will affect the classifier, the author also conducts experiments and show the effectiveness of the classifier.

Weakness

Overall, it is a good job, but since the classifiers may just learn the different artifacts between the fake and real images, so the classifiers may will be disabled if we add noise to input. To the contribution 1, the author claims “To avoid learning image formatting artifacts, we preprocess our images to reduce formatting differences between real and fake images.”. but the author has not conducted any ablation studies to demonstrate they reduce formatting differences. Besides, from the experiment of finetuned generator, we can find the classifier is not robust to the other generator. I think how to make classifiers robust to multiple generators may be a crucial problem.