في مجال التعلم العميق مثل معالجة الصور ، تلعب مكتبة Keras دورًا رئيسيًا ، وتبسط بشكل كبير تعلم النقل واستخدام نماذج مدربة مسبقًا. في مجال معالجة اللغة الطبيعية (NLP) ، لحل المشكلات المعقدة إلى حد ما ، مثل الإجابة على الأسئلة أو تصنيف النوايا ، يجب عليك الجمع بين سلسلة من النماذج. في هذه المقالة ، سنوضح كيف تبسط مكتبة DeepPavlov بناء سلاسل النماذج لـ NLP. استنادًا إلى DeepPavlov واستخدام Azure ML ، سنقوم ببناء شبكة عصبية للإجابة على الأسئلة تم تدريبها على مجموعة بيانات COVID-19.

, , , BERT. BERT, , , , .
BERT , , BERT . , , , , — , — , TF-IDF ( ) .
, . , :
.
DeepPavlov
DeepPavlov . :
- ;
- , config-;
- , ;
- Python SDK .
NLP . REST API Microsoft Bot Framework. , DeepPavlov , Keras .
DeepPavlov - demo.deeppavlov.ai.
BERT DeepPavlov
BERT. DeepPavlov , , Twitter. chainer , :
simple_vocab (y), , (y_ids);transformers_bert_preprocessor x BERT;transformers_bert_embedder BERT-one_hotter y_ids one-hot encoding, ;keras_classification_model — , CNN ;proba2labels — , .
:
dataset_reader — ;train — ;- .
, :
python -m deeppavlov install sentiment_twitter_bert_emb.json
python -m deeppavlov download sentiment_twitter_bert_emb.json
python -m deeppavlov train sentiment_twitter_bert_emb.json
install (, Keras, transformers ..), download , .
, :
python -m deeppavlov interact sentiment_twitter_bert_emb.json
Python SDK:
model = build_model(configs.classifiers.sentiment_twitter_bert_emb)
result = model(["This is input tweet that I want to analyze"])
: ODQA
, BERT, , , ODQA (Open Domain Question Answering). ODQA , Wikipedia. , , . BERT .
, ODQA :

ODQA DeepPavlov , R-NET, BERT. , , ODQA BERT. "" COVID-19 OpenResearch Dataset, 52 000 COVID-19. , .
Azure ML
Azure Machine Learning, Notebooks. AzureML — Dataset. COVID-19 Semantic Scholar. , JSON-.
Azure ML Dataset. Azure ML Portal, Datasets from web files. file, . tabular, . URL, .

, , . , notebook compute, . ODQA , Azure ML NC12 112 . .
:
from azureml.core import Workspace, Dataset
workspace = Workspace.from_config()
dataset = Dataset.get_by_name(workspace, name='COVID-NC')
.tar.gz. , UNIX:
mnt_ctx = dataset.mount('data')
mnt_ctx.start()
!tar -xvzf ./data/noncomm_use_subset.tar.gz
mnt_ctx.stop()
. noncomm_use_subset .json, abstract body_text. , Python-:
from os.path import basename
def get_text(s):
return ' '.join([x['text'] for x in s])
os.makedirs('text',exist_ok=True)
for fn in glob.glob('noncomm_use_subset/pdf_json/*'):
with open(fn) as f:
x = json.load(f)
nfn = os.path.join('text',basename(fn).replace('.json','.txt'))
with open(nfn,'w') as f:
f.write(get_text(x['abstract']))
f.write(get_text(x['body_text']))
text, . :
!rm -fr noncomm_use_subset
ODQA
, ODQA DeepPavlov. en_odqa_infer_wiki:
import sys
!{sys.executable} -m pip --quiet install deeppavlov
!{sys.executable} -m deeppavlov install en_odqa_infer_wiki
!{sys.executable} -m deeppavlov download en_odqa_infer_wiki
. , , , . !
, :
from deeppavlov import configs
from deeppavlov.core.commands.infer import build_model
odqa = build_model(configs.odqa.en_odqa_infer_wiki)
answers = odqa([ "Where did guinea pigs originate?",
"When did the Lynmouth floods happen?" ])
:
['Andes of South America', '1804']
, Wikipedia. , :
- What is coronavirus? — a strain of a particular virus
- What is COVID-19? — nest on roofs or in church towers
- Where did COVID-19 originate? — northern coast of Appat
- When was the last pandemic? — 1968
, … , , . — .
(ranker), . DeepPavlov. ODQA en_ranker_tfidf_wiki, data_path, , :
from deeppavlov.core.common.file import read_json
model_config = read_json(configs.doc_retrieval.en_ranker_tfidf_wiki)
model_config["dataset_reader"]["data_path"] = os.path.join(os.getcwd(),"text")
model_config["dataset_reader"]["dataset_format"] = "txt"
model_config["train"]["batch_size"] = 1000
, .
, :
doc_retrieval = train_model(model_config)
doc_retrieval(['hydroxychloroquine'])
, .
ODQA , :
squad = build_model(configs.squad.multi_squad_noans_infer, download = True)
odqa = build_model(configs.odqa.en_odqa_infer_wiki, download = False)
odqa(["what is coronavirus?","is hydroxychloroquine suitable?"])
:
['an imperfect gold standard for identifying King County influenza admissions',
'viral hepatitis']
…
BERT Q&A
DeepPavlov , Stanford Question AnsweringDataset (SQuAD): R-NET BERT. R-NET. BERT. squad_bert_infer - BERT:
!{sys.executable} -m deeppavlov install squad_bert_infer
bsquad = build_model(configs.squad.squad_bert_infer, download = True)
ODQA, :
{
"class_name": "logit_ranker",
"squad_model":
{"config_path": ".../multi_squad_noans_infer.json"},
"in": ["chunks","questions"],
"out": ["best_answer","best_answer_score"]
}
, multi_squad_noans_infer. ODQA, squad_model squad_bert_infer:
odqa_config = read_json(configs.odqa.en_odqa_infer_wiki)
odqa_config['chainer']['pipe'][-1]['squad_model']['config_path'] =
'{CONFIGS_PATH}/squad/squad_bert_infer.json'
, :
odqa = build_model(odqa_config, download = False)
odqa(["what is coronavirus?",
"is hydroxychloroquine suitable?",
"which drugs should be used?"])
, :
, Azure Machine Learning NLP DeepPavlov - . DeepPavlov , , , . COVID Kaggle , , DeepPavlov Azure Machine Learning. , DeepPavlov – .
Azure ML DeepPavlov . , . . Data Science , , , !