Your First BERT: An Illustrated Guide

bert-distilbert-sentence-classification


Progress in machine learning for natural language processing has accelerated significantly over the past few years. Models left research laboratories and became the foundation of leading digital products. A good illustration of this is the recent announcement that the BERT model has become the main component behind Google’s search . Google believes that this step (that is, the introduction of an advanced model of understanding natural language into the search engine) represents “the greatest breakthrough in the last five years and one of the most significant in the history of search engines”.


– BERT' . , , , , .


, Colab.


: SST2


SST2, , ( 1), ( 0):


sst2


:


– , ( , ) 1 ( ), 0 ( ). :


sentiment-classifier-1.png


:



, , 768. , .


distilbert-bert-sentiment-classifier.png


, BERT, ELMO ( NLP ): ( [CLS]).



, , . DistilBERT', . , , «» BERT', . , , BERT , [CLS] . , , . , , BERT .


transformers DistilBERT', .


model-training



, . DistilBERT' 2 .


bert-distilbert-tutorial-sentence-embedding


DistilBERT'. Scikit Learn. , , :


bert-distilbert-train-test-split-sentence-embedding


distilBERT' ( #1) , ( #2). , sklearn , , 75% ,


:


bert-training-logistic-regression



, , , .


«a visually stunning rumination on love». BERT' , . , ( [CLS] [SEP] ).


bert-distilbert-tokenization-1


, . Word2vec .


bert-distilbert-tokenization-2-token-ids


:


tokenizer.encode("a visually stunning rumination on love", add_special_tokens=True)

DistilBERT'.


BERT, ELMO ( NLP ), :



DistilBERT


DistilBERT' , BERT'. , 768 .


bert-model-input-output-1


, , ( [CLS] ). .


bert-model-calssification-output-vector-cls


, , . :


bert-distilbert-sentence-classification-example


, .



. Colab github.


:


import numpy as np
import pandas as pd
import torch
import transformers as ppb # pytorch transformers
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split

github, pandas:


df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv', delimiter='\t', header=None)

df.head() , 5 , :


df.head()

sst2-df-head


DistilBERT


model_class, tokenizer_class, pretrained_weights = (ppb.DistilBertModel, ppb.DistilBertTokenizer, 'distilbert-base-uncased')

##  BERT  distilBERT?   :
#model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased')

#   / 
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)

. , , . . ( , 2000).



tokenized = df[0].apply((lambda x: tokenizer.encode(x, add_special_tokens=True)))

.


sst2-text-to-tokenized-ids-bert-example


( Series/DataFrame pandas) . DistilBERT , 0 (padding). , ( , Python).


, /, BERT':


bert-input-tensor


DistilBERT'


DistilBERT.


input_ids = torch.tensor(np.array(padded))

with torch.no_grad():
    last_hidden_states = model(input_ids)

last_hidden_states DistilBERT', ( , , DistilBERT). , 2000 (.. 2000 ), 66 ( 2000 ), 278 ( DistilBERT).


bert-distilbert-output-tensor-predictions


BERT'


3-d . :


bert-output-tensor



. :


bert-input-to-output-tensor-recap



BERT' [CLS]. .


bert-output-tensor-selection


, 3d , 2d :


#        ,      
features = last_hidden_states[0][:,0,:].numpy()

features 2d numpy, .


bert-output-cls-senteence-embeddings


, BERT'



, BERT', , . 768 , .


logistic-regression-dataset-features-labels


, . BERT' [CLS] ( #0), (. ). , – , BERT/DistilBERT


, , .


labels = df[1]
train_features, test_features, train_labels, test_labels = train_test_split(features, labels)

:


bert-distilbert-train-test-split-sentence-embedding


.


lr_clf = LogisticRegression()
lr_clf.fit(train_features, train_labels)

, .


lr_clf.score(test_features, test_labels)

, (accuracy) – 81%.



: – 96.8. DistilBERT , – , . BERT , ( downstream task). DistilBERT' 90.7. BERT' 94.9.



Colab.


That's all! A good first acquaintance happened. The next step is to turn to the documentation and try to fine-tune your own hands. You can also go back a bit and go from distilBERT to BERT and see how it works.


Thanks to Clément Delangue , Victor Sanh , and to the Huggingface team who provided feedback on earlier versions of this guide.


Authors



All Articles