مفتوح- GPT-2-3

في عام 2019 ، شهدنا الاستخدام الرائع للتعلم الآلي. أظهر نموذج OpenAI GPT-2 قدرة رائعة على كتابة نصوص متماسكة وعاطفية متفوقة على فهمنا لما يمكن أن تولده نماذج اللغة الحديثة. إن GPT-2 ليست بنية معمارية جديدة على وجه الخصوص - فهي تذكرنا جدًا بـ Transformer-Decoder (Transoder-only Transformer). الفرق بين GPT-2 هو أنه نموذج لغوي ضخم حقًا يعتمد على Transformer ، تم تدريبه على مجموعة بيانات رائعة. في هذه المقالة ، سنلقي نظرة على بنية النموذج ، مما يسمح لنا بتحقيق مثل هذه النتائج: نعتبر بالتفصيل طبقة الانتباه الذاتي واستخدام محول فك التشفير للمهام التي تتجاوز نمذجة اللغة.

المحتوى

1: GPT-2
- BERT'
- - : GPT-2
- : GPT-2,
2:
- ( )
- 1 – ,
- 2 –
- 3 –
- GPT-2
- !
3:

1: GPT-2

Word2vec , – , , . – , .

لوحة مفاتيح سريعة

, GPT-2 , , , . GPT-2 40 (WebText), OpenAI . , , SwiftKey, 78 , GPT-2 500 , GPT-2 – 13 ( 6,5 ).

أحجام gpt2

GPT-2 AllenAI GPT-2 Explorer. GPT-2 ( ), .

, – .. . – , - .

محول - ترميز - فك

, , , , ( AlphaStar).

gpt-2-transformer-xl-bert-3

? , GPT-2 :

gpt2-size-hyperparameters-3

BERT'

:
, .

GPT-2 . BERT , , . . , GPT-2, , . , GPT-2 :

gpt-2-output

: , , . . «» (auto-regression) RNN .

gpt-2-autoregression-2

GPT-2 TransformerXL XLNet . BERT . . , BERT . XLNet , .

– :

محول ترميز كتلة 2

(, 512 ). , .

– , . :

محول - مفكك - بلوك - 2

, [mask] , BERT', , , .

, , #4, , :

المحول - وحدة فك الترميز - الانتباه الذاتي - 2

, BERT, GPT-2. . :

الانتباه الذاتي والملثمين والاهتمام الذاتي

, «Generating Wikipedia by Summarizing Long Sequences» , : . «-». 6 :

محول - فك - مقدمة

. , . , 4000 – 512 .

, , . « », / .

GPT-2 OpenAI .

- : GPT-2

, , . , , , , . (Budgie)

GPT-2 , .

gpt-2-طبقات -2

GPT-2 1024 . .

GPT-2 – ( ) (), (.. ). , ( <|endoftext|>; <|s|>).

gpt2-simple-output-2

, . , (score) – , (50 GPT-2). – «the». - – , , , , – . . GPT-2 top-k, , , (, , top-k = 1).

gpt-2-simple-output-3

, . GPT-2 ( ). GPT-2 .

. . NLP-, , – , .

gpt2-token-embeddings-wte-2

– , - . GPT-2. 768 /.

, <|s|> . , – , . , 1024 .

الترميز الموضعي gpt2

. , GPT-2.

gpt2-input-embedding-pososition-encoding-3 (إدخال-تضمين-تضمين-ترميز -3)

#1.

, , . , . , , .

gpt2-transformer-block-vectors-2

. , :

, , , .

, . , . , , :

;
(« »);
.

: , , ( ). , , .

, «a robot» «it». , , , .

gpt2-self-attention-example-2

. :

– , ( ). , ;
– . ;
– ; , , .

self-attention-example-folders-3

. – , . . , – . , .

(: ).

self-attention-example-folders-scores-3

, .

gpt2-value-vector-sum

, 50% «robot», 30% «a» 19% – «it». . .

( ), .

gpt2-output-projection-2

, . .

gpt2-output-scores-2

(top_k = 1). , . , , ( ). – top_k 40: 40 .

gpt2-output

, . , (1024 ) .

: GPT-2,

, , GPT-2. , , . , ( TransformerXL XLNet).

, :

«» «» ; GPT-2 (Byte Pair Encoding) . , .
GPT-2 / (inference/evaluation mode). . . (512), 1, .
/ . .
, . Transformer , .
. «zoom in», :

zoom-in

2:

, «it»:

gpt2-self-attention-1-2

, . , , . , , .

( )

, . , 4 .

, ;
;
.

self-attention-summary

1 – ,

. . . ( «» ):

self-attention-1

, WQ, WK, WV

2 –

, , №2: .

self-attention-2

( ) ,

3 –

. , .

self-attention-3-2

, – , .

, , . ( ).

, , , . №2. , . . , :

masked-self-attention-2

, (attention mask). , , («robot must obey orders»). 4 : ( , – ). .. , 4 , ( 4 ) .

transformer-decoder-attention-mask-dataset

, . , , ( ), :

queries-keys-attention-mask

«» . , , – (-inf) (, -1 GPT-2):

transformer-attention-mask

, , , :

transformer-attention-masked-scores-softmax

( №1), («robot»), 100% .
( №2), («robot must»), «must» 48% «robot» 52% «must».
..

GPT-2

GPT-2.

:

, GPT-2 , . , , , .

( <|s|>).

gpt2-self-attention-qkv-1-2

GPT-2 «a». :

gpt2-self-attention-qkv-2-2

, «robot», , «a» – , :

gpt2-self-attention-qkv-3-2

GPT-2: 1 – ,

, «it». , «it» + #9:

gpt2-self-attention-1

( ), , .

gpt2-self-attention-2

(bias vector),

, , «it».

gpt2-self-attention-3

( ) ,

GPT-2: 1.5 – «»

, «» . . (Q), (K) (V). «» – . GPT-2 12 «» , :

gpt2-self-attention-split-attention-heads-1

, «» . «» , ( 12 «» ):

gpt2-self-attention-split-attention-heads-2

GPT-2: 2 –

( , «» ):

gpt2-self-attention-scoring

( «» #1 ):

gpt2-self-attention-scoring-2

GPT-2: 3 –

, , , «» #1:

gpt2-self-attention-multihead-sum-1

GPT-2: 3.5 – «»

«» , , :

gpt2-self-attention-merge-heads-1

. .

GPT-2: 4 –

, , . , «» :

gpt2-self-attention-project-1

, , :

مشروع gpt2- الاهتمام الذاتي- 2

GPT-2: #1

– , , . . 4 ( GPT-2 768, 768*4 = 3072 ). ? ( 512 #1 – 2048). , , .

gpt2-mlp1

( )

GPT-2:

(768 GPT-2). .

gpt2-mlp-2

( )

!

, - . , . , , :

الأوزان gpt2 المحول كتلة 2

. , :

gpt2- الأوزان -2

, :

معلمات gpt2-117

- 124 117. , , (, ).

3:

, . , . .

. :

فك ترميز المحول فقط

, . , ( , ) . :

ويكيبيديا تلخيص

تلخيص وحدة فك الترميز فقط

Sample Efficient Text Summarization Using a Single Pre-Trained Transformer , . , , - .

GPT-2 .

. « » – (, « »).

, . , (), ( ). (, , ) «» – , .

ترميز أداء الموسيقى - المحول - 3

– one-hot . midi . :

مثال على تمثيل الموسيقى

one-hot :

الموسيقى - المحولات - المدخلات - التمثيل - 2

الموسيقى-المحولات-الانتباه الذاتي -2