Machine learning has pretty much penetrated our everyday lives. Some are no longer surprised when they are told about neural networks in their smartphones. One of the big areas in this science is recommendation systems. They are everywhere: when you listen to music, read books, watch TV shows or videos. The development of this science takes place in giant companies such as YouTube , Spotify and Netfilx . Of course, all scientific achievements in this area are published both at the famous NeurIPS or ICML conferences , and at the slightly less well-known RecSyssharpened on this subject. And in this article we will talk about how this science developed, what methods are used in the recommendations then and now, and what mathematics is behind all this.

I was inspired to write this article by working in the StatML lab at Skoltech related to recommender systems.
Why and for whom this article
Why can this be important for each of us? Take a look at the list below:
- Video recommendations: YouTube, Netlix, HBO, Amazon Prime, Disney +, Hulu, Okko
- Audio recommendations: Spotify, Yandex.Music, Yandex.Radio, Apple Music
- Product Recommendations: Amazon, Avito, LitRes, MyBook
- Search recommendations: Google, Yandex, Bing, Yahoo, Mail
- : Booking, Twitter, Instagram, ., , GitHub
, . , . , , YouTube.
( ), . , , . , ( ). -, . , -, , - . , . , , , , .
, , , , - . , (, , , ..). , .
, . : U I — . r u i u i. , , , , . :
D = { ( u , i ) | if ∃ r u i , u ∈ U , i ∈ I }
f:
f ( u , i ) = r u i ≈ r u i
, r u i 1 5 ( ) : 1 -1 ( / )
3 :
- Content-based (CB)
- Collaborative filtering (CF)
- Hybrid recommendations

. — , . : — , — , . . , : , ..
, . - , .
. , , , .
, , . . , , . .
Matrix Factorization

, . : . .
- :
- Singular Value Decomposition (SVD)
- Singular Value Decomposition with implicit feedback (SVD++)
- Collaborative Filtering with Temporal Dynamics (TimeSVD++)
- Weighted Matrix Factorization (WMF or ALS)
- Sparse Linear Methods (SLIM)
- Factorization Machines (FM)
- :
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization (BPMF)
- Bayesian Factorization Machines (BFM)
- Gaussian Process Factorization Machines (GPFM)
Singular Value Decomposition (SVD)
— SVD. A n × m, n = | U |, m = | I |. D A u i = r u i, . SVD , A : U, Σ, V. k , A.
A=UΣVT,A≈ˆA=ˆUˆΣˆVT.
Q P . A :
P=(ˆUˆΣ)T,Q=ˆVT,A≈PT⋅Q.
:
rui≈ˆrui=pTuqi.
, pu qi — u i - k. . . :
Θ={pu,qi|u∈U,i∈I}.
c :
∑(u,i)∈D(rui−ˆrui)2+λ∑θ∈Θ‖θ‖2=∑(u,i)∈D(rui−pTuqi)2+λ∑u∈U‖pu‖2+λ∑i∈I‖qi‖2.
, , , , . ˆrui. (GD) (ALS). Habr- , . , , .
( SVD, SVDbias). , , . . SVD . (bias):
ˆrui=μ+bu+bi+pTuqi,
bu — , bi — , μ — . :
Θ={μ,bu,bi,pu,qi|u∈U,i∈I}.
SVD++
Factorization Meets the Neighborhood SVD . (explicit and implicit user feedback). rui, . . : R(u) — ( ) N(u) — ( ).
SVD++ :
ˆrui=μ+bu+bi+qTi(pu+|N(u)|−1/2∑j∈N(u)yj).
:
Θ={μ,bu,bi,pu,qi,yi|u∈U,i∈I}.
, N(u) R(u), .. R(u)⊂N(u). (item-item recommendation).
Asymmetric-SVD
SVD++ . . :
ˆrui=bui+qTi(|R(u)|−1/2∑j∈R(u)(ruj−buj)xj+|N(u)|−1/2∑j∈N(u)yj),
bui=μ+bu+bi
TimeSVD++
TimeSVD++. (MovieLens, Netflix) , . , . Collaborative Filtering with Temporal Dynamics SVD++ :
ˆrui(t)=μ+bu(t)+bi(t)+qTi(pu(t)+|R(u)|−1/2∑j∈R(u)yj).
Let's figure out how exactly the time affects each term:
Item bias: If you divide the time interval when the ratings were put into segments (30 parts are proposed in the work) and add your own parametersbi,Bin(t) for each product, which are selected depending on the interval in which the variable t:
bi(t)=bi+bi,Bin(t)
User bias: Analyzing the data of Neflix, we noticed that for each user on average there are only 40 days in which he put the ratings. Therefore, we will act as with goods and add our own parametersbu,tfor each user. We add a linear dependence on time - we introduce an additional termαu with depreciation ratio:bi(t)=bi+αu⋅devu(t)+bu,tdevu(t)=sign(t−tu)⋅|t−tu|β
.There are other options on how to add a time dependency to users: It is described in more detail in the articleUser embeddings: we will add a similar trick for each component of our latent representationpu(t)=(pu1(t),…,puf(t))T:puk(t)=puk+αuk⋅devu(t)+puk,t.
Weighted Matrix Factorization (WMF) & Alternating Least Squares (ALS)
One of the main problems that SVD has is the use of only an explicit response from the user. This problem was partially solved in SVD ++ . But there is another way - Weighted Matrix Factorization ( WMF ). In this article, they suggested almost not changing the model (ˆrui=pTuqi), and change the learning process. Assign for ratingsruithat we don’t know (i.e. for couples (u,i)∉D) the value is 0. And then for each pair (u,i) enter the parameter cui, rui. , . - . YouTube , . , , , . , , , :
∑(u,i)cui(rui−ˆrui)2+λ∑θ∈Θ‖θ‖2.
: cui=1+αrui. : rui>0 rui=0. α α=40 .
, (ALS) . WMF, ALS, .
Fast Alternating Least Squares
ALS , eALS . , . , . .
, cui ALS :
cui=ci+αrui,ci=c0fβi∑j∈Ifβj,
c0 β, .
Sparse Linear Methods (SLIM)
Sparse Linear Methods (SLIM) . , SVD . . SLIM :
ˆaui=aTuwiˆA=AW
W∈Rm×m. : W≥0 diag(W)=0. :
12‖A−AW‖2F+β2‖W‖2F+λ‖W‖1
W .
Factorization Machines (FM)
, Factorization Machines (FM). , ( 2- ). :
ˆr(x)=w0+n∑i=1wixi+n∑i=1n∑i=j+1vTivj xixj,w0∈R w∈Rn V∈Rn×k.
(SGD) ( ). , x (u,i). . , — . — ( ). ( ).

, SVD, SVD++ — FM. SVD , :
n=|U∪I|,xj=δ(j=u ∨ j=i).
δ — . .. x u i. FM :
ˆr(x)=w0+wu+wi+vTuvi.
, x: , . , , , , , , . .
Probabilistic Matrix Factorization (PMF)
, , , .
(PMF), . , SVD: pu qi — . , :
p(r|P,Q,σ)=∏(u,i)∈DN(rui|g(pTuqi),σ2),
N — , a g(x)=11+e−x — (). , , :
p(P|σp)=∏u∈Up(pu|0,σ2pI),p(Q|σ2q)=∏i∈Ip(qi|0,σ2qI).
( ) , :
12∑(u,i)∈D(rui−pTuqi)2+λp2∑u∈U‖pu‖2+λq2∑i∈I‖qi‖2,
λp=σpσ λq=σqσ — . , SVD , .
Constrained PMF
PMF Constrained PMF. , SVD SVD++. , , pu :
pu+∑i∈R(u)yi|R(u)|,
R(u) — , u.
Bayesian Probabilistic Matrix Factorization (BPMF)
PMF BPMF. PMF , , . :
p(P|μp,Λp)=∏u∈Up(pu|μp,Λp),p(Q|μq,Λq)=∏i∈Ip(qi|μq,Λq).
Θp={μp,Λp} Θq={μq,Λq} - Θ0={μ0,ν0,W0}. , .
Bayesian Factorization Machines (BFM)
, Bayesian, . , Θ={w0,wi,vi}. , - . : ΘH={λθ,μθ|θ∈Θ}. .
Gaussian Process Factorization Machines (GPFM)
GPFM . f θ. θu , , :
ˆrui=f(qi,θu)
, . , f , . , , , .
.
: Bayesian Personalized Ranking (BPR)
BPR , "". , BPR — , , Bayesian Personalized Ranking. . , , i j u. (u,i) rui (u,i,j) i j ((+) i , j (-) ). DS. . ( personalized ):
p(i<uj|Θ)=σ(ˆruij(Θ)),
σ — , a ˆruij — . (MLE), , :
minΘ∑(u,i,j)∈DSlnσ(ˆruij)−λ‖Θ‖2
(SGD):
Θ←Θ+α(e−ˆruij1+e−ˆruij⋅∂∂Θˆruij+λΘ)
, . , , . SVD:
ˆruij=ˆrui−ˆrujˆrui=pTuqi
:
∂∂Θˆruij={(qik−qjk) if θ=pukpuk if θ=qik−puk if θ=qjk
BPR ( ) , . , . . , (pairwise approach) (pointwise approach). . , 5 , . , , . — BPR - , . , .
Show must go on
This time we discussed many Factorization methods in recommendation systems, but the Graph and Neural Networks , which also have a lot of interesting things , remained untouched .