论文阅读笔记A Latent Transformer for Disentangled Face Editing in Images and Videos

论文题目:应用于图像和视频解纠缠面部编辑的潜在转换器

一、introduction and related work(记了一些关键语句)

(1)研究表明,在生成模型的潜在空间中,沿特定方向移动潜在代码可以导致相应生成图像中视觉属性的不变性。 

(2)Firstly, successful manipulations can only be achieved in well disentangled and linearized latent spaces

(3)用线性变换对人脸属性进行操作是非常有局限性的。

(4)the state-of-the-art image generator to project real image to latent space:stylegan

(5)The transformation network generates disentangled,identity-preserving and controllable attribute editing resultson real images

(6)有关disentangled representations相关的工作

  • One  optimization-based  method,  Im-age2StyleGAN++ , carried out local editing along with global semantic edits on images by applying masked interpolation on the activation features of StyleGAN(?这是什么)
  • Collinsetal. performed a k-means clustering on the activations of StyleGAN and detected a disentanglement of semantic objects,  which enables further local semantic editing on the generated image
  • For high level semantic edits, Ganalyze[13] learned a manifold in the latent space of BigGAN [5] togenerate images of different memorability. 
  • InterFaceGAN[35] proposed to learn a hyper-plane for a binary classifi-cation in the latent space, which one can use to manipulatethe target facial attribute by simple interpolation.  Follow-ing their work,  StyleSpace [42] carried out a quantitativestudy on the latent spaces of StyleGAN [21] and realized ahighly localized and disentangled control of the visual attributes.
  •  StyleFlow [3] achieved conditional exploration ofthe latent space by training conditional normalizing flows.
  • 还有很多,具体看论文related work部分

二、contributions

We propose a latent transformation network for facial attribute editing, achieving disentangled and controllable manipulations on real images with good identity preservation. 

Our method can carry out efficient sequential attribute editing on real images. 

We introduce a pipeline to generalize the face editing to videos and generate realistic and stable manipulations on high resolution videos.

三、method

1、we propose a framework to edit faces inreal images and videos via the latent space of StyleGAN.

2、假设总共有n个属性a,对于每个不同的attributes训练不同的transformer

3、为了从latent code中predict attributes,用了一个latent classifier C,C是pre-trained

Latent Classifier:To predict attributes on the manipu-lated latent codes, we train an attribute classifierC on the“latent code - label” pairs. 

The classifier consists of three fully connected layers with ReLU activations in between.C is fixed during the training of the latent transformer.

面部属性分类器引用于:(Harness-ing synthesized abstraction images to improve facial attributerecognition)

4.Given a latent code w∈ W+,the latent transformer T generates the direction for a single attribute modification, where the amount of changes is controlled by a scaling factor α. The network is expressed with a single layer of linear transformation

 5.loss function

 四、evaluation metrics

1、quantitative 

We compare our method quantitatively with GANSpace and  InterFaceGAN  using  three  metrics: 

(1) target  attribute change rate

(2)attribute preservation rate

(3)identity preser-vation score

2、qualitative

原文地址:https://www.cnblogs.com/h694879357/p/15528988.html