People meet recommender systems. Factorization

Machine learning has pretty much penetrated our everyday lives. Some are no longer surprised when they are told about neural networks in their smartphones. One of the big areas in this science is recommendation systems. They are everywhere: when you listen to music, read books, watch TV shows or videos. The development of this science takes place in giant companies such as YouTube , Spotify and Netfilx . Of course, all scientific achievements in this area are published both at the famous NeurIPS or ICML conferences , and at the slightly less well-known RecSyssharpened on this subject. And in this article we will talk about how this science developed, what methods are used in the recommendations then and now, and what mathematics is behind all this.



I was inspired to write this article by working in the StatML lab at Skoltech related to recommender systems.


Why and for whom this article


Why can this be important for each of us? Take a look at the list below:


  • Video recommendations: YouTube, Netlix, HBO, Amazon Prime, Disney +, Hulu, Okko
  • Audio recommendations: Spotify, Yandex.Music, Yandex.Radio, Apple Music
  • Product Recommendations: Amazon, Avito, LitRes, MyBook
  • Search recommendations: Google, Yandex, Bing, Yahoo, Mail
  • : Booking, Twitter, Instagram, ., , GitHub

, . , . , , YouTube.


( ), . , , . , ( ). -, . , -, , - . , . , , , , .


, , , , - . , (, , , ..). , .



, . : U I — . r u i u i. , , , , . :


D = { ( u , i ) |  if  r u i , u U , i I }


f:


f ( u , i ) = r u ir u i


, r u i 1 5 ( ) : 1 -1 ( / )



3 :


  • Content-based (CB)
  • Collaborative filtering (CF)
  • Hybrid recommendations


. — , . : — , — , . . , : , ..


, . - , .


. , , , .


, , . . , , . .


Matrix Factorization



, . : . .


  • :
    • Singular Value Decomposition (SVD)
    • Singular Value Decomposition with implicit feedback (SVD++)
    • Collaborative Filtering with Temporal Dynamics (TimeSVD++)
    • Weighted Matrix Factorization (WMF or ALS)
    • Sparse Linear Methods (SLIM)
    • Factorization Machines (FM)
  • :
    • Probabilistic Matrix Factorization (PMF)
    • Bayesian Probabilistic Matrix Factorization (BPMF)
    • Bayesian Factorization Machines (BFM)
    • Gaussian Process Factorization Machines (GPFM)

Singular Value Decomposition (SVD)


SVD. A n × m, n = | U |, m = | I |. D A u i = r u i, . SVD , A : U, Σ, V. k , A.


A=UΣVT,AˆA=ˆUˆΣˆVT.


Q P . A :


P=(ˆUˆΣ)T,Q=ˆVT,APTQ.


:


ruiˆrui=pTuqi.


, pu qiu i - k. . . :


Θ={pu,qi|uU,iI}.


c :


(u,i)D(ruiˆrui)2+λθΘθ2=(u,i)D(ruipTuqi)2+λuUpu2+λiIqi2.


, , , , . ˆrui. (GD) (ALS). Habr- , . , , .


( SVD, SVDbias). , , . . SVD . (bias):


ˆrui=μ+bu+bi+pTuqi,


bu — , bi — , μ — . :


Θ={μ,bu,bi,pu,qi|uU,iI}.


SVD++


Factorization Meets the Neighborhood SVD . (explicit and implicit user feedback). rui, . . : R(u) — ( ) N(u) — ( ).


SVD++ :


ˆrui=μ+bu+bi+qTi(pu+|N(u)|1/2jN(u)yj).


:


Θ={μ,bu,bi,pu,qi,yi|uU,iI}.


, N(u) R(u), .. R(u)N(u). (item-item recommendation).


Asymmetric-SVD


SVD++ . . :


ˆrui=bui+qTi(|R(u)|1/2jR(u)(rujbuj)xj+|N(u)|1/2jN(u)yj),


bui=μ+bu+bi


TimeSVD++


TimeSVD++. (MovieLens, Netflix) , . , . Collaborative Filtering with Temporal Dynamics SVD++ :


ˆrui(t)=μ+bu(t)+bi(t)+qTi(pu(t)+|R(u)|1/2jR(u)yj).


Let's figure out how exactly the time affects each term:
Item bias: If you divide the time interval when the ratings were put into segments (30 parts are proposed in the work) and add your own parametersbi,Bin(t) for each product, which are selected depending on the interval in which the variable t:

bi(t)=bi+bi,Bin(t)


User bias: Analyzing the data of Neflix, we noticed that for each user on average there are only 40 days in which he put the ratings. Therefore, we will act as with goods and add our own parametersbu,tfor each user. We add a linear dependence on time - we introduce an additional termαu with depreciation ratio:

bi(t)=bi+αudevu(t)+bu,tdevu(t)=sign(ttu)|ttu|β


.
There are other options on how to add a time dependency to users: It is described in more detail in the article
User embeddings: we will add a similar trick for each component of our latent representationpu(t)=(pu1(t),,puf(t))T:

puk(t)=puk+αukdevu(t)+puk,t.



Weighted Matrix Factorization (WMF) & Alternating Least Squares (ALS)


One of the main problems that SVD has is the use of only an explicit response from the user. This problem was partially solved in SVD ++ . But there is another way - Weighted Matrix Factorization ( WMF ). In this article, they suggested almost not changing the model (ˆrui=pTuqi), and change the learning process. Assign for ratingsruithat we don’t know (i.e. for couples (u,i)D) the value is 0. And then for each pair (u,i) enter the parameter cui, rui. , . - . YouTube , . , , , . , , , :


(u,i)cui(ruiˆrui)2+λθΘθ2.


: cui=1+αrui. : rui>0 rui=0. α α=40 .


, (ALS) . WMF, ALS, .


Fast Alternating Least Squares


ALS , eALS . , . , . .


, cui ALS :


cui=ci+αrui,ci=c0fβijIfβj,


c0 β, .


Sparse Linear Methods (SLIM)


Sparse Linear Methods (SLIM) . , SVD . . SLIM :


ˆaui=aTuwiˆA=AW


WRm×m. : W0 diag(W)=0. :


12AAW2F+β2W2F+λW1


W .


Factorization Machines (FM)


, Factorization Machines (FM). , ( 2- ). :


ˆr(x)=w0+ni=1wixi+ni=1ni=j+1vTivj xixj,w0R   wRn   VRn×k.


(SGD) ( ). , x (u,i). . , — . — ( ). ( ).



, SVD, SVD++FM. SVD , :


n=|UI|,xj=δ(j=u  j=i).


δ — . .. x u i. FM :


ˆr(x)=w0+wu+wi+vTuvi.


, x: , . , , , , , , . .


Probabilistic Matrix Factorization (PMF)


, , , .


(PMF), . , SVD: pu qi — . , :


p(r|P,Q,σ)=(u,i)DN(rui|g(pTuqi),σ2),


N — , a g(x)=11+ex — (). , , :


p(P|σp)=uUp(pu|0,σ2pI),p(Q|σ2q)=iIp(qi|0,σ2qI).


( ) , :


12(u,i)D(ruipTuqi)2+λp2uUpu2+λq2iIqi2,


λp=σpσ λq=σqσ — . , SVD , .


Constrained PMF


PMF Constrained PMF. , SVD SVD++. , , pu :


pu+iR(u)yi|R(u)|,


R(u) — , u.


Bayesian Probabilistic Matrix Factorization (BPMF)


PMF BPMF. PMF , , . :


p(P|μp,Λp)=uUp(pu|μp,Λp),p(Q|μq,Λq)=iIp(qi|μq,Λq).


Θp={μp,Λp} Θq={μq,Λq} - Θ0={μ0,ν0,W0}. , .


Bayesian Factorization Machines (BFM)


, Bayesian, . , Θ={w0,wi,vi}. , - . : ΘH={λθ,μθ|θΘ}. .


Gaussian Process Factorization Machines (GPFM)


GPFM . f θ. θu , , :


ˆrui=f(qi,θu)


, . , f , . , , , .


.


: Bayesian Personalized Ranking (BPR)


BPR , "". , BPR — , , Bayesian Personalized Ranking. . , , i j u. (u,i) rui (u,i,j) i j ((+) i , j (-) ). DS. . ( personalized ):


p(i<uj|Θ)=σ(ˆruij(Θ)),


σ — , a ˆruij — . (MLE), , :


minΘ(u,i,j)DSlnσ(ˆruij)λΘ2


(SGD):


ΘΘ+α(eˆruij1+eˆruijΘˆruij+λΘ)


, . , , . SVD:


ˆruij=ˆruiˆrujˆrui=pTuqi


:


Θˆruij={(qikqjk)   if θ=pukpuk               if θ=qikpuk            if θ=qjk


BPR ( ) , . , . . , (pairwise approach) (pointwise approach). . , 5 , . , , . — BPR - , . , .


Show must go on


This time we discussed many Factorization methods in recommendation systems, but the Graph and Neural Networks , which also have a lot of interesting things , remained untouched .

Source: https://habr.com/ru/post/undefined/


All Articles