openAI-GPT-2-3

En 2019, fuimos testigos del uso brillante del aprendizaje automático. El modelo OpenAI GPT-2 ha demostrado una capacidad impresionante para escribir textos coherentes y emocionales que son superiores a nuestra comprensión de lo que pueden generar los modelos de lenguaje modernos. GPT-2 no es una arquitectura particularmente nueva: recuerda mucho al Transformador-Decodificador (Transformador solo de decodificador). La diferencia entre GPT-2 es que es un modelo de lenguaje realmente enorme basado en Transformer, entrenado en un impresionante conjunto de datos. En este artículo, analizaremos la arquitectura del modelo que nos permite lograr tales resultados: examinaremos en detalle la capa de auto atención y el uso del Transformador de decodificación para tareas que van más allá del modelado del lenguaje.

Contenido

1: GPT-2
- BERT'
- - : GPT-2
- : GPT-2,
2:
- ( )
- 1 – ,
- 2 –
- 3 –
- GPT-2
- !
3:

1: GPT-2

Word2vec , – , , . – , .

teclado de tecla rápida

, GPT-2 , , , . GPT-2 40 (WebText), OpenAI . , , SwiftKey, 78 , GPT-2 500 , GPT-2 – 13 ( 6,5 ).

gpt2-tamaños

GPT-2 AllenAI GPT-2 Explorer. GPT-2 ( ), .

, – .. . – , - .

transformador-codificador-decodificador

, , , , ( AlphaStar).

gpt-2-transformer-xl-bert-3

? , GPT-2 :

gpt2-tamaños-hiperparámetros-3

BERT'

:
, .

GPT-2 . BERT , , . . , GPT-2, , . , GPT-2 :

gpt-2-output

: , , . . «» (auto-regression) RNN .

gpt-2-autoregression-2

GPT-2 TransformerXL XLNet . BERT . . , BERT . XLNet , .

– :

transformador-codificador-bloque-2

(, 512 ). , .

– , . :

transformador-decodificador-bloque-2

, [mask] , BERT', , , .

, , #4, , :

transformador-decodificador-bloque-auto-atención-2

, BERT, GPT-2. . :

auto-atención-y-auto-atención enmascarada

, «Generating Wikipedia by Summarizing Long Sequences» , : . «-». 6 :

introducción-decodificador-transformador

. , . , 4000 – 512 .

, , . « », / .

GPT-2 OpenAI .

- : GPT-2

, , . , , , , . (Budgie)

GPT-2 , .

gpt-2-capas-2

GPT-2 1024 . .

GPT-2 – ( ) (), (.. ). , ( <|endoftext|>; <|s|>).

gpt2-simple-output-2

, . , (score) – , (50 GPT-2). – «the». - – , , , , – . . GPT-2 top-k, , , (, , top-k = 1).

gpt-2-simple-output-3

, . GPT-2 ( ). GPT-2 .

. . NLP-, , – , .

gpt2-token-embeddings-wte-2

– , - . GPT-2. 768 /.

, <|s|> . , – , . , 1024 .

codificación posicional gpt2

. , GPT-2.

gpt2-input-embedded-positional-encoding-3

#1.

, , . , . , , .

gpt2-transformer-block-vectors-2

. , :

, , , .

, . , . , , :

;
(« »);
.

: , , ( ). , , .

, «a robot» «it». , , , .

gpt2-self-attention-example-2

. :

– , ( ). , ;
– . ;
– ; , , .

self-attention-example-folders-3

. – , . . , – . , .

(: ).

self-attention-example-folders-scores-3

, .

gpt2-value-vector-sum

, 50% «robot», 30% «a» 19% – «it». . .

( ), .

gpt2-output-projection-2

, . .

gpt2-output-scores-2

(top_k = 1). , . , , ( ). – top_k 40: 40 .

gpt2-output

, . , (1024 ) .

: GPT-2,

, , GPT-2. , , . , ( TransformerXL XLNet).

, :

«» «» ; GPT-2 (Byte Pair Encoding) . , .
GPT-2 / (inference/evaluation mode). . . (512), 1, .
/ . .
, . Transformer , .
. «zoom in», :

zoom-in

2:

, «it»:

gpt2-self-attention-1-2

, . , , . , , .

( )

, . , 4 .

, ;
;
.

self-attention-summary

1 – ,

. . . ( «» ):

self-attention-1

, WQ, WK, WV

2 –

, , №2: .

self-attention-2

( ) ,

3 –

. , .

self-attention-3-2

, – , .

, , . ( ).

, , , . №2. , . . , :

masked-self-attention-2

, (attention mask). , , («robot must obey orders»). 4 : ( , – ). .. , 4 , ( 4 ) .

transformer-decoder-attention-mask-dataset

, . , , ( ), :

queries-keys-attention-mask

«» . , , – (-inf) (, -1 GPT-2):

transformer-attention-mask

, , , :

transformer-attention-masked-scores-softmax

( №1), («robot»), 100% .
( №2), («robot must»), «must» 48% «robot» 52% «must».
..

GPT-2

GPT-2.

:

, GPT-2 , . , , , .

( <|s|>).

gpt2-self-attention-qkv-1-2

GPT-2 «a». :

gpt2-self-attention-qkv-2-2

, «robot», , «a» – , :

gpt2-self-attention-qkv-3-2

GPT-2: 1 – ,

, «it». , «it» + #9:

gpt2-self-attention-1

( ), , .

gpt2-self-attention-2

(bias vector),

, , «it».

gpt2-self-attention-3

( ) ,

GPT-2: 1.5 – «»

, «» . . (Q), (K) (V). «» – . GPT-2 12 «» , :

gpt2-self-attention-split-attention-heads-1

, «» . «» , ( 12 «» ):

gpt2-self-attention-split-attention-heads-2

GPT-2: 2 –

( , «» ):

gpt2-self-attention-scoring

( «» #1 ):

gpt2-self-attention-scoring-2

GPT-2: 3 –

, , , «» #1:

gpt2-self-attention-multihead-sum-1

GPT-2: 3.5 – «»

«» , , :

gpt2-self-attention-merge-heads-1

. .

GPT-2: 4 –

, , . , «» :

gpt2-self-attention-project-1

, , :

gpt2-self-atención-proyecto-2

GPT-2: #1

– , , . . 4 ( GPT-2 768, 768*4 = 3072 ). ? ( 512 #1 – 2048). , , .

gpt2-mlp1

( )

GPT-2:

(768 GPT-2). .

gpt2-mlp-2

( )

!

, - . , . , , :

gpt2-transformer-block-weights-2

. , :

gpt2-weights-2

, :

Parámetros gpt2-117

- 124 117. , , (, ).

3:

, . , . .

. :

decodificador-solo-transformador-traducción

, . , ( , ) . :

wikipedia-resumen

resumen de solo decodificador

Sample Efficient Text Summarization Using a Single Pre-Trained Transformer , . , , - .

GPT-2 .

. « » – (, « »).

, . , (), ( ). (, , ) «» – , .

codificador-rendimiento-transformador-musical-3

– one-hot . midi . :

ejemplo-representación-musical

one-hot :

transformador-de-musica-entrada-representacion-2

transformador de música-auto-atención-2