2018 was a turning point for the development of machine learning models aimed at solving text processing problems (or, more correctly, processing Natural Language (NLP)). A conceptual understanding of how to present words and sentences for the most accurate extraction of their semantic meanings and relationships between them is growing rapidly. Moreover, the NLP community promotes incredibly powerful tools that can be downloaded and used for free in their models and pipelines. This turning point is also called NLP's ImageNet moment , referring to the moment several years ago, when similar developments significantly accelerated the development of machine learning in the field of computer vision problems.

transformer-ber-ulmfit-elmo

(ULM-FiT has nothing to do with Korzhik, but something better did not occur)

– BERT', , NLP. BERT – , NLP-. , , BERT', . , , , .

bert-transfer-learning

BERT'. 1: ( ); 2: .

BERT , NLP-, , : Semi-supervised Sequence learning ( – Andrew Dai Quoc Le), ELMo ( – Matthew Peters AI2 UW CSE), ULMFiT ( – fast.ai Jeremy Howard Sebastian Ruder), OpenAI Transformer ( – OpenAI Radford, Narasimhan, Salimans, Sutskever) (Vaswani et al).

, , BERT'. , , .

:

BERT – . :

Bert-classification-spam

, , (classifier) BERT' . (fine-tuning), Semi-supervised Sequence Learning ULMFiT.

, : , . . («» « »).

spam-labeled-dataset

BERT':

(sentiment analysis)
- : /. : /
- : SST
(fact-checking):
- : . : «» (Claim) « » (Not Claim)
- / :
  - : (Claim sentence). : «» «»
- Full Fact – , . , , ( , , , )
- :

, BERT', , .

bert-base-bert-large

BERT' :

BERT BASE () – OpenAI Transformer;
BERT LARGE () – , (state of the art), .

, BERT – . . , – BERT’ , .

bert-base-bert-large-encoders

BERT' ( « » (Transformer Blocks)): 12 24 . (768 1024 ) «» (attention heads)(12 16 ), , (6 , 512 , 8 «» ).

bert-input-output

[CLS] , . CLS .

, , BERT , . (self-attention) , .

bert-encoders-input

, ( , ). .

hidden_size (768 BERT'). , , ( [CLS]).

bert-output-vector

. , .

bert-classifier

(, «», « », « », «» .), .

, , , VGGNet .

vgg-net-classifier

. , NLP- , : Word2Vec GloVe. , , , .

, . Word2Vec , ( ), , (.. , «» – «» «» – «»), (, , «» «» , «» «»).

, , , . , Word2Vec GloVe. GloVe «stick» ( – 200):

glove-embedding

«stick» GloVe – 200 ( 2 ).

vector-boxes

ELMo:

GloVe, «stick» . « », – NLP- ( Peters et. al., 2017, McCann et. al., 2017 Peters et. al., 2018 ELMo). – « «stick» , . , – , , ?». (contextualized word-embeddings).

elmo-embedding-robin-williams