⛹🏾 👺 🤔 Dipfake video un cuadro 🎲 🎛️ 👂🏻

Ejemplo de trabajo del modelo de movimiento de primer orden

¿Es posible hacer una película completa a partir de una fotografía? Y habiendo grabado los movimientos de una persona, ¿reemplazarlo con otra en el video? Por supuesto, la respuesta a estas preguntas es extremadamente importante para áreas como el cine, la fotografía y el desarrollo de juegos de computadora. La solución podría ser el procesamiento digital de fotos utilizando un software especializado. El problema en cuestión entre los especialistas en este campo se llama la tarea de síntesis automática de video o animación de imágenes.

Para obtener el resultado esperado, los enfoques existentes combinan objetos extraídos de la imagen original y movimientos que pueden entregarse como un video separado: "donante".

Ahora, en la mayoría de las áreas, la animación de imágenes se realiza mediante herramientas de gráficos por computadora. Este enfoque requiere un conocimiento adicional sobre el objeto que queremos animar: su modelo 3D generalmente es necesario ( aquí se puede encontrar cómo funciona ahora en la industria del cine ). La mayoría de las últimas soluciones a este problema se basan en la capacitación en profundidad de modelos, que se basan en redes neuronales generativas competitivas (GAN) y autoencoders variacionales (VAE). Estos modelos generalmente usan módulos pre-entrenados para buscar puntos clave de objetos en la imagen. El principal problema con este enfoque es que estos módulos solo pueden reconocer los objetos en los que fueron entrenados.

, ? «First Order Motion Model for Image Animation». — First Order Motion Model, . , (, , ), , .

…

, .

, , (occlusion map). . , , .

: .
$D \in \mathbb{R} ^{3×H×W}$ $S ∈ \mathbb{R} ^{3×H×W}$ . $S$ $D$ .

$S$ $D$ . , ( ) $R$ . $\hat{\mathcal{T}}_{\mathrm{S \leftarrow D}}$ $D$ $S$ $\hat{\mathcal{O}}_{\mathrm{S \leftarrow D}}$ . .

$\mathcal{T}_{\mathrm{S \leftarrow D}}$ $D$ $S$ . $\mathcal{T}_{\mathrm{S \leftarrow D}}$ . , $R$ ( ), $\mathcal{T}_{\mathrm{S \leftarrow D}}$ $\mathcal{T}_{\mathrm{S \leftarrow R}}$ $\mathcal{T}_{\mathrm{R \leftarrow D}}$ . , $X$ , $\mathcal{T}_{\mathrm{X \leftarrow R}}$ . $K$ $p_1,..., p_K$ , $p_1,..., p_K$ $R$ .

$\mathcal{T}_{\mathrm{R \leftarrow X}} = \mathcal{T}_{\mathrm{X \leftarrow R}}^{-1}$ , , $\mathcal{T}_{\mathrm{X \leftarrow R}}$ .

T_{S \leftarrow D} = T_{S \leftarrow R} \circ T_{R \leftarrow D} = T_{S \leftarrow R} \circ T_{D \leftarrow R}^{- 1}

$\mathcal{T}_{\mathrm{S \leftarrow D}} = \mathcal{T}_{\mathrm{S \leftarrow R}} \circ \mathcal{T}_{\mathrm{R \leftarrow D}} = \mathcal{T}_{\mathrm{S \leftarrow R}} \circ \mathcal{T}_{\mathrm{D \leftarrow R}}^{-1}$

$\mathcal{T}_{\mathrm{S \leftarrow R}}(p_k)$ $\mathcal{T}_{\mathrm{D \leftarrow R}}(p_k)$ . U-Net, $K$ , .
softmax , .

$P$ $\hat{\mathcal{T}}_{\mathrm{S \leftarrow D}}$ $\mathcal{T}_{\mathrm{S \leftarrow D}}(z)$ ( $z$ ), $S$ . , $\hat{\mathcal{T}}_{\mathrm{S \leftarrow D}}$ , , $D$ , $S$ . $\hat{\mathcal{T}}_{\mathrm{S \leftarrow D}}$ , $K$ $S^0,...,S^k$ ( $S^0 = S$ ), $\hat{\mathcal{T}}_{\mathrm{S \leftarrow D}}$ . $S^1,...,S^k$ U-Net.
$\hat{\mathcal{T}}_{\mathrm{S \leftarrow D}}(z)$ :

$M_k$ — ( $M_0$ — ) $J_k$ :

, $S$ $\hat{D}$ . , . down-sampling $\xi \in \mathbb{R}^{H' \times W'}$ . $\xi$ c $\hat{\mathcal{T}}_{\mathrm{S \leftarrow D}}$ . $S$ , $\hat{D}$ . — $\hat{\mathcal{O}}_{\mathrm{S \leftarrow D}} \in [0, 1]^{H' \times W'}$ , , , $S$ . :

ξ^{'} = {\hat{O}}_{S \leftarrow D} ⊙ f_{w} (ξ, {\hat{T}}_{S \leftarrow D})

$\xi ' = \hat{\mathcal{O}}_{\mathrm{S \leftarrow D}} \odot f_w(\xi, \hat{\mathcal{T}}_{\mathrm{S \leftarrow D}})$

$f_w(\cdot, \cdot)$ , $\odot$ — ( ).

, . $\xi '$ , .

, . reconstruction loss, . - VGG-19. reconstruction loss :

L_{r e c} (\hat{D}, D) = \sum_{i = 1}^{I} | N_{i} (\hat{D}) - N_{i} (D) |

$L_{rec} (\hat{D}, D)= \sum_{i = 1}^I |N_i(\hat{D}) - N_i(D)|$

$\hat{D}$ — , $D$ — , $N_i(\cdot)$ — i- , VGG-19, $I$ — .

- . . , . , . , , , .

, $X$ $\mathcal{T}_{\mathrm{X \leftarrow Y}}$ , , thin plane spline. $Y$ . , $\mathcal{T}_{\mathrm{X \leftarrow R}}$
$\mathcal{T}_{\mathrm{Y \leftarrow R}}$ . C :