How we use item2vec to recommend similar products

Hi, my name is Vasya Rubtsov, I am developing recommendation systems in Avito.


The main purpose of the ad space is to help sellers find buyers and buyers who are looking for products. Unlike online stores, the fact of sale occurs outside of our platform, and we cannot track this. Therefore, the key metric for us is “contact” - this is the event of pressing the “show phone” button on the product card, or the beginning of a dialogue in the messenger with the seller. From this metric we get “buyers” - the number of unique users per day who made at least one contact.


The two main products that Avito's recommendation department deals with are user recommendations on the homepage or user2item and a block of similar ads on the product card or item2item. A third of all ad views and a quarter of all contacts come from recommendations, so referral engines play an important role in Avito.


In the article I will tell how we improved our item2item recommendations due to item2vec and how this affected user2item recommendations.



As it was before


Previously, to search for similar ads, we used a linear model on features obtained for a pair of ads: the number of matching words in the title and description of the ad, matching locations, parameters, proximity by geo. The coefficients in this model were selected by multi-armed bandits. We told this in a separate article .


   , , . ,    « wifi » «SQ11» ,  , , . — «  », «   », «-» « ». ,   , . «»       ,  .


, ,   .   finn.no "Deep neural network marketplace recommenders in online experiments".


item2vec


item2vec  ,     ,   .


 ,   .   .



    ,  ,  . ,   — .   , ,   ,   ,   .  ?   ,  :



  , 0.6 ± 0.1.       .



 .  , ,   .   ?


, ,   — , ,   . , ,   .   . , . ,      ,  ,   -.


 


  . , , 8 .     3 .



, ,   . ,  desktop   . ,  app — .


  540  180  .



  — 14 . ,   ,   . ,   «» «»  . .   — .


    :



— 128- .


— ,   embedding : , , (  , ) (  ).  title embedding ,   — lstm. — one-hot .   — .


title , ,  .   , .   . , — lstm GRU .    , .


. ,  . —   . , . — , .


  — .  ,  -1  1. ,  128,    int8   .


  . ,  int8.


GPU, 20 . CPU GPU .


,   id  .  , .



, 128- .
   ,   .   :


1. .



2. 4000  ( — ,   GPU)   .   ,     ,  .



3. , .
— . 100    , 100    .



4. — cross entropy loss — .



  , forward  4001 , backward —  101,  ,  .


    500 000 000 () × 4 000 ( ) × 5 () = 10 ^ 13 . 2  4 x Tesla P40.


  :




  : 7    , 6    ,    . ,   , .   prec@8.




  item2vec   « ».    AvitoNet. AvitoNet — ,   .    .     -.


  3    6 .     — 62 .  , — 2048,   GPU,  CPU . :  ,   — , 3 , . , .   ,   , . Prec@8  0,4%, , .




  « »:



  « / /122—128 (6—8 )».   :



  , ,   «» «».


 item2vec :



« ». -, -   «» «».  , ,   «». , — !



    top-n .   Sphinx.    , .  200ms (p99) 200K rpm.



    .     -.    , , - — , .   .


  .



  ,  .   :


sim(i, j) = <v_i, v_j> * (log(t_j + 1) ^ a_c)


i j — , j (t_j = now - start_time_j).


, , ,  , .       . , c — a_c. ,   0,   . ,   — .   .  , ,   - . , .



, .
— ,    , , .


 item2vec  30%  ,  20%    .


All Articles