рд╢реАрд░реНрд╖рдХ "рдЖрдкрдХреЗ рд▓рд┐рдП рд▓реЗрдЦ рдкрдврд╝реЗрдВред" рдорд╛рд░реНрдЪ 2020. рднрд╛рдЧ 2



рдирдорд╕реНрдХрд╛рд░, рд╣реЗрдмреНрд░!


рд╣рдо рдЪреИрдирд▓ #article_essense рд╕реЗ рдУрдкрди рдбреЗрдЯрд╛ рд╕рд╛рдЗрдВрд╕ рд╕рдореБрджрд╛рдп рдХреЗ рд╕рджрд╕реНрдпреЛрдВ рдХреЗ рд╡реИрдЬреНрдЮрд╛рдирд┐рдХ рд▓реЗрдЦреЛрдВ рдХреА рд╕рдореАрдХреНрд╖рд╛рдУрдВ рдХреЛ рдкреНрд░рдХрд╛рд╢рд┐рдд рдХрд░рдирд╛ рдЬрд╛рд░реА рд░рдЦрддреЗ рд╣реИрдВред рдпрджрд┐ рдЖрдк рдЙрдиреНрд╣реЗрдВ рд╣рд░ рдХрд┐рд╕реА рд╕реЗ рдкрд╣рд▓реЗ рдкреНрд░рд╛рдкреНрдд рдХрд░рдирд╛ рдЪрд╛рд╣рддреЗ рд╣реИрдВ - рд╕рдореБрджрд╛рдп рдореЗрдВ рд╢рд╛рдорд┐рд▓ рд╣реЛрдВ ! рдорд╛рд░реНрдЪ рд╕рдореАрдХреНрд╖рд╛ рд╡рд┐рдзрд╛рдирд╕рднрд╛ рдХрд╛ рдкрд╣рд▓рд╛ рднрд╛рдЧ рдкрд╣рд▓реЗ рдкреНрд░рдХрд╛рд╢рд┐рдд рдХрд┐рдпрд╛ рдЧрдпрд╛ рдерд╛ ред


:


  1. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (UC Berkeley, Google Research, UC San Diego, 2020)
  2. Scene Text Recognition via Transformer (China, 2020)
  3. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization (Imperial College London, Google Research, 2019)
  4. Lagrangian Neural Networks (Princeton, Oregon, Google, Flatiron, 2020)
  5. Deformable Style Transfer (Chicago, USA, 2020)
  6. Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? (MIT, Google, 2020)
  7. Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification (Carnegie Mellon University, USA, 2020)


1. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis


: Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng (UC Berkeley, Google Research, UC San Diego, 2020)
:: GitHub project :: Video :: Blog
: ( belskikh)


3 ( 25 ) , - .





:


  1. 3 2 .
  2. 3 (x, y, z) , MLP, (r, g, b) .
  3. , 2 2 .

, MLP, L2 3 .


, :


  1. , x, y, z. : MLP x, y, z 8 (ReLU, 256 ) 256- . 4 MLP (ReLU, 128 ), RGB .
  2. Positional Encoding . , , . ( ) positional encoding MLP
  3. , , "", "". Hierarchical volume sampling.

. L2 GT "" "" . 1-2 1xV100.





2. Scene Text Recognition via Transformer


: Xinjie Feng, Hongxun Yao, Yuankai Qi, Jun Zhang, Shengping Zhang (China, 2020)
:: GitHub project
: ( belskikh)


: , .. .


Optical Character Recognition (OCR), ResNet Transformer. , ( ..).





:


  1. 9696
  2. 4 ResNet ( 1/16, )
  3. HxWxC (6x6x1024) H*WxC (36x1024), FC , ( 36x256)
  4. Word embedding ( 256)
  5. .

, "" , , . .


4 ResNet-101 ( torchvision). :


  • 4 , -;
  • multi-head self attention ;
  • position-wise ;
  • 4 , -, - тАФ multi-head attention ;
  • - residual layer norm;
  • ( , ).

SynthText RRC-ArT. Tesla P40 ( ).


"" ( -) / .


(. ):


  • , , ;
  • .

:


  1. -.
  2. .
  3. - / .




3. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization


: (Imperial College London, Google Research, 2019)

: ( yorko)


extractive abstractive. тАУ , тАУ , , , .


PEGASUS ( тАУ Imperial College, тАУ Google Research) abstractive summarization Gap Sentences Generation objective . Masked Language Modeling, BERT & Co. , тАУ . , abstractive self-supervised objective, , . extractive- , тАУ .


. , reverse-engineering. : 3 , [MASK1], [MASK2]. тАЬ-тАЭ тАУ . , Gap Sentences Generation objective MLM BERT, , GSG , -MLM . -MLM :).





() , тАЬтАЭ . 3 : (Random), (Lead) Principal, ROUGE1-F1. ( 30%) , .


Principal , Lead, .. bias, .


: C4 (Colossal and Cleaned version of Common Crawl, 750 Gb), HugeNews тАУ 3.8 Tb ( , ).


12 ( , , , reddit ..), SOTA ( , . 2019). , , 1000 .


Appendix E , 10 SOTA ( , 12 ):





тАУ , тАЬтАЭ , , Chelsea тАЬJose MourinhoтАЩs sideтАЭ тАЬThe BluesтАЭ. , . - тАУ .


: , - тАУ , . . тАЬbig dataтАЭ , .


тАУ abstractive . ROUGE тАУ , . ROUGE , / - . , . , SOTA .


4. Lagrangian Neural Networks


: Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, Shirley Ho (Princeton, Oregon, Google, Flatiron, 2020)
:: GitHub project :: Blog
: ( graviton)



, "" ( ) . , . "" , , , , 2- .






:
тАФ . , (Hamiltonian Neural Networks Deep Lagrangian Networks), . :


  • Hamiltonian Neural Networks тАФ "" , -, ;
  • Deep Lagrangian Networks тАФ , .. , , .


( ). .. , , , , , . .


:


  • , , .
  • , тАФ ( , , ).
  • , , (), .


(-) , jax.



4- 500 . .



ReLU , .. 2- . ReLU^2, ReLU^3, tanh, sigmoid softplus. softplus .



  • .
  • . , : 0.4% 8% (. ).




5. Deformable Style Transfer


: Sunnie S. Y. Kim, Nicholas Kolkin, Jason Salavon, Gregory Shakhnarovich (Chicago, USA, 2020)

: ( digitman)


, . , . , .





DST(deformable style transfer) "" "", : , - (, ). тАФ , . , ( ).


( ), - , - . NBB (neural best-buddies): CNN 2- , , "" , , k (80), k 2- , .


, . dense flow field 'I' warped W(I, ╬╕). thin-plate spline interpolation. , , ╬╕ тАФ .





: тАФ , f тАФ , g тАФ , h тАФ .


, : Gatys ( Gramm matrix vgg , тАФ ) STROTSS ( 3- , ). , . DST , :


Lstyle(Is,X)+Lstyle(Is,W(X,╬╕)).


I_s X ( ) I_s W(X, ╬╕). .


Lwarp(P,PтА▓,╬╕)=1kтИСi=1k||(pi+╬╕i)тИТpiтА▓||2


тАФ deformation loss k , p_i p_i' тАФ source target . ╬╕. , p_i' p_i, , , - source target . "total variation norm of the 2D warp field". .


RTV(f)=1WHтИСi=1WтИСj=1H||fi+1,jтИТfi,j||1+||fi,j+1тИТfi,j||1


:


L(X,╬╕,Ic,Is,P)=╬▒Lcontent(Ic,X)+Lstyle(Is,X)+Lstyle(Is,W(X,╬╕))+╬▓Lwarp(P,PтА▓,╬╕)+╬│RTV(ftheta),


╬▒ ╬▓ , ╬│ тАФ . ( ) тАФ warped stylized image W(I, ╬╕).


, , , . : , , STROTSS, STROTSS, DST, , DST.





user study AMT , "" , "" .


6. Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?


: Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, Phillip Isola (MIT, Google, 2020)
:: GitHub project
: ( belskikh)


few-show learning, . , 3%, , meta-learning .


-, , - - . . - , - - . Few-shot learning -.


, :


  1. ( , - , ).
  2. - .
  3. - (1-5) , .




self-distillation Born-again . , - . - GT , KL тАФ .


, . , (, , ) -.


ResNet-12 SeResNet-12 miniImageNet, tiered- ImageNet, CIFAR-FS, FC100.


kNN, , .





7. Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification


: Devesh Walawalkar, Zhiqiang Shen, Zechun Liu, Marios Savvides (Carnegie Mellon University, USA, 2020)

: ( artgor)


cutmix. , , attention maps, . , . CIFAR-10, CIFAR-100, ImageNet (!) ResNet, DenseNet, EfficientNet. + 1.5% .





, , :


  • , , ;
  • feature map, ;
  • , .




: , CAM CutMix, Chris Kaggle::Bengali.AI Handwritten Grapheme Classification.



pytorch:


  • CIFAR-10 тАФ 80 , batch size 32, learning rate 1e-3, weight decay 1e-5;
  • CIFAR-100 тАФ 120 , batch size 32, learning rate 1e-3, weight decay 1e-5;
  • ImageNet тАФ 100 ResNet DenseNet, 180 EfficientNet, batch size 64, learning rate 1e-3

Ablation study


, тАФ , . 1 15. 6 .


All Articles