рджреАрдкрдкрд╛рд╡рд▓реЛрд╡: рдХреЗрд░рд╕ рдлреЙрд░ рдиреЗрдЪреБрд░рд▓ рд▓реИрдВрдЧреНрд╡реЗрдЬ рдкреНрд░реЛрд╕реЗрд╕рд┐рдВрдЧ COVID-2019 рдХреЗ рдмрд╛рд░реЗ рдореЗрдВ рд╕рд╡рд╛рд▓реЛрдВ рдХреЗ рдЬрд╡рд╛рдм рджреЗрдиреЗ рдореЗрдВ рдорджрдж рдХрд░рддрд╛ рд╣реИ

рдЧрд╣рд░реА рд╕реАрдЦрдиреЗ рдЬреИрд╕реЗ рдЫрд╡рд┐ рдкреНрд░рд╕рдВрд╕реНрдХрд░рдг рдХреЗ рдХреНрд╖реЗрддреНрд░ рдореЗрдВ, рдХреЗрд░рд╕ рдкреБрд╕реНрддрдХрд╛рд▓рдп рдПрдХ рдорд╣рддреНрд╡рдкреВрд░реНрдг рднреВрдорд┐рдХрд╛ рдирд┐рднрд╛рддрд╛ рд╣реИ, рд╕реНрдерд╛рдирд╛рдВрддрд░рдг рдЕрдзрд┐рдЧрдо рдХреЛ рд╕рд░рд▓ рдмрдирд╛рдиреЗ рдФрд░ рдкреВрд░реНрд╡-рдкреНрд░рд╢рд┐рдХреНрд╖рд┐рдд рдореЙрдбрд▓ рдХрд╛ рдЙрдкрдпреЛрдЧ рдХрд░рдиреЗ рдореЗрдВред рдкреНрд░рд╛рдХреГрддрд┐рдХ рднрд╛рд╖рд╛ рдкреНрд░рд╕рдВрд╕реНрдХрд░рдг (рдПрдирдПрд▓рдкреА) рдХреЗ рдХреНрд╖реЗрддреНрд░ рдореЗрдВ, рдХрд╛рдлреА рдЬрдЯрд┐рд▓ рд╕рдорд╕реНрдпрд╛рдУрдВ рдХреЛ рд╣рд▓ рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП, рдЬреИрд╕реЗ рд╕рд╡рд╛рд▓реЛрдВ рдХреЗ рдЬрд╡рд╛рдм рджреЗрдиреЗ рдпрд╛ рдЗрд░рд╛рджреЛрдВ рдХреЛ рд╡рд░реНрдЧреАрдХреГрдд рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП, рдЖрдкрдХреЛ рдореЙрдбрд▓реЛрдВ рдХреА рдПрдХ рд╢реНрд░реГрдВрдЦрд▓рд╛ рдХреЛ рд╕рдВрдпреЛрдЬрд┐рдд рдХрд░рдирд╛ рд╣реЛрдЧрд╛ред рдЗрд╕ рд▓реЗрдЦ рдореЗрдВ, рд╣рдо рджрд┐рдЦрд╛рдПрдВрдЧреЗ рдХрд┐ рдХреИрд╕реЗ рдбреАрдПрд▓рдкрд╛рд╡рд▓реЛрд╡ рдкреБрд╕реНрддрдХрд╛рд▓рдп рдПрдирдПрд▓рдкреА рдХреЗ рд▓рд┐рдП рдореЙрдбрд▓ рд╢реНрд░реГрдВрдЦрд▓рд╛рдУрдВ рдХреЗ рдирд┐рд░реНрдорд╛рдг рдХреЛ рд╕рд░рд▓ рдмрдирд╛рддрд╛ рд╣реИ ред рдбреАрдкрдкрд╛рд╡рд▓реЛрд╡ рдкрд░ рдЖрдзрд╛рд░рд┐рдд рдФрд░ рдПрдЬрд╝реБрд░ рдПрдордПрд▓ рдХрд╛ рдЙрдкрдпреЛрдЧ рдХрд░рдХреЗ , рд╣рдо COVID-19 рдбреЗрдЯрд╛ рд╕реЗрдЯ рдкрд░ рдкреНрд░рд╢рд┐рдХреНрд╖рд┐рдд рдкреНрд░рд╢реНрди-рдЙрддреНрддрд░ рддрдВрддреНрд░рд┐рдХрд╛ рдиреЗрдЯрд╡рд░реНрдХ рдХрд╛ рдирд┐рд░реНрдорд╛рдг рдХрд░реЗрдВрдЧреЗред



, , , BERT. BERT, , , , .


BERT , , BERT . , , , , тАФ , тАФ , TF-IDF ( ) .


, . , :


  • BERT,
  • ,

.


DeepPavlov


DeepPavlov . :


  • ;
  • , config-;
  • , ;
  • Python SDK .

NLP . REST API Microsoft Bot Framework. , DeepPavlov , Keras .


DeepPavlov - demo.deeppavlov.ai.


BERT DeepPavlov


BERT. DeepPavlov , , Twitter. chainer , :


  • simple_vocab (y), , (y_ids);
  • transformers_bert_preprocessor x BERT;
  • transformers_bert_embedder BERT-
  • one_hotter y_ids one-hot encoding, ;
  • keras_classification_model тАФ , CNN ;
  • proba2labels тАФ , .

:


  • dataset_reader тАФ ;
  • train тАФ ;
  • .

, :


python -m deeppavlov install sentiment_twitter_bert_emb.json
python -m deeppavlov download sentiment_twitter_bert_emb.json
python -m deeppavlov train sentiment_twitter_bert_emb.json

install (, Keras, transformers ..), download , .


, :


python -m deeppavlov interact sentiment_twitter_bert_emb.json

Python SDK:


model = build_model(configs.classifiers.sentiment_twitter_bert_emb)
result = model(["This is input tweet that I want to analyze"])

: ODQA


, BERT, , , ODQA (Open Domain Question Answering). ODQA , Wikipedia. , , . BERT .


, ODQA :


  • , .. ;
  • BERT-, тАФ .

рджреАрдкрдкрд╛рд╡рд▓реЛрд╡ рдмреНрд▓реЙрдЧ рд╕реЗ рдЪрд┐рддреНрд░


ODQA DeepPavlov , R-NET, BERT. , , ODQA BERT. "" COVID-19 OpenResearch Dataset, 52 000 COVID-19. , .


Azure ML


Azure Machine Learning, Notebooks. AzureML тАФ Dataset. COVID-19 Semantic Scholar. , JSON-.


Azure ML Dataset. Azure ML Portal, Datasets from web files. file, . tabular, . URL, .


COVID рдбреЗрдЯрд╛рд╕реЗрдЯ


, , . , notebook compute, . ODQA , Azure ML NC12 112 . .


:


from azureml.core import Workspace, Dataset
workspace = Workspace.from_config()
dataset = Dataset.get_by_name(workspace, name='COVID-NC')

.tar.gz. , UNIX:


mnt_ctx = dataset.mount('data')
mnt_ctx.start()
!tar -xvzf ./data/noncomm_use_subset.tar.gz
mnt_ctx.stop()

. noncomm_use_subset .json, abstract body_text. , Python-:


from os.path import basename
def get_text(s):
    return ' '.join([x['text'] for x in s])

os.makedirs('text',exist_ok=True)

for fn in glob.glob('noncomm_use_subset/pdf_json/*'):
    with open(fn) as f:
        x = json.load(f)
    nfn = os.path.join('text',basename(fn).replace('.json','.txt'))
    with open(nfn,'w') as f:
        f.write(get_text(x['abstract']))
        f.write(get_text(x['body_text']))

text, . :


!rm -fr noncomm_use_subset

ODQA


, ODQA DeepPavlov. en_odqa_infer_wiki:


import sys
!{sys.executable} -m pip --quiet install deeppavlov
!{sys.executable} -m deeppavlov install en_odqa_infer_wiki
!{sys.executable} -m deeppavlov download en_odqa_infer_wiki

. , , , . !


, :


from deeppavlov import configs
from deeppavlov.core.commands.infer import build_model
odqa = build_model(configs.odqa.en_odqa_infer_wiki)
answers = odqa([ "Where did guinea pigs originate?",
                 "When did the Lynmouth floods happen?" ])

:


['Andes of South America', '1804']

, Wikipedia. , :


  • What is coronavirus? тАФ a strain of a particular virus
  • What is COVID-19? тАФ nest on roofs or in church towers
  • Where did COVID-19 originate? тАФ northern coast of Appat
  • When was the last pandemic? тАФ 1968

, тАж , , . тАФ .



(ranker), . DeepPavlov. ODQA en_ranker_tfidf_wiki, data_path, , :


from deeppavlov.core.common.file import read_json
model_config = read_json(configs.doc_retrieval.en_ranker_tfidf_wiki)
model_config["dataset_reader"]["data_path"] = os.path.join(os.getcwd(),"text")
model_config["dataset_reader"]["dataset_format"] = "txt"
model_config["train"]["batch_size"] = 1000

, .


, :


doc_retrieval = train_model(model_config)
doc_retrieval(['hydroxychloroquine'])

, .


ODQA , :


# Download all the SQuAD models
squad = build_model(configs.squad.multi_squad_noans_infer, download = True)
# Do not download the ODQA models, we've just trained it
odqa = build_model(configs.odqa.en_odqa_infer_wiki, download = False)
odqa(["what is coronavirus?","is hydroxychloroquine suitable?"])

:


['an imperfect gold standard for identifying King County influenza admissions',
 'viral hepatitis']

тАж


BERT Q&A


DeepPavlov , Stanford Question AnsweringDataset (SQuAD): R-NET BERT. R-NET. BERT. squad_bert_infer - BERT:


!{sys.executable} -m deeppavlov install squad_bert_infer
bsquad = build_model(configs.squad.squad_bert_infer, download = True)

ODQA, :


{
   "class_name": "logit_ranker",
   "squad_model": 
    {"config_path": ".../multi_squad_noans_infer.json"},
   "in": ["chunks","questions"],
   "out": ["best_answer","best_answer_score"]
}

, multi_squad_noans_infer. ODQA, squad_model squad_bert_infer:


odqa_config = read_json(configs.odqa.en_odqa_infer_wiki)
odqa_config['chainer']['pipe'][-1]['squad_model']['config_path'] = 
                    '{CONFIGS_PATH}/squad/squad_bert_infer.json'

, :


odqa = build_model(odqa_config, download = False)
odqa(["what is coronavirus?",
      "is hydroxychloroquine suitable?",
      "which drugs should be used?"])

, :


what is coronavirus?respiratory tract infection
is hydroxychloroquine suitable?well tolerated
which drugs should be used?antibiotics, lactulose, probiotics
what is incubation period?3-5 days
is patient infectious during incubation period?MERS is not contagious
how to contaminate virus?helper-cell-based rescue system cells
what is coronavirus type?enveloped single stranded RNA viruses
what are covid symptoms?insomnia, poor appetite, fatigue, and attention deficit
what is reproductive number?5.2
what is the lethality?10%
where did covid-19 originate?uveal melanocytes
is antibiotics therapy effective?less effective
what are effective drugs?M2, neuraminidase, polymerase, attachment and signal-transduction inhibitors
what is effective against covid?Neuraminidase inhibitors
is covid similar to sars?All coronaviruses share a very similar organization in their functional and structural genes
what is covid similar to?thrombogenesis


, Azure Machine Learning NLP DeepPavlov - . DeepPavlov , , , . COVID Kaggle , , DeepPavlov Azure Machine Learning. , DeepPavlov тАУ .


Azure ML DeepPavlov . , . . Data Science , , , !


All Articles