Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: latent factor model with stochastic gradient method/alternating least squares
From
Stas Kolenikov <[email protected]>
To
[email protected]
Subject
Re: st: latent factor model with stochastic gradient method/alternating least squares
Date
Wed, 6 Oct 2010 09:05:50 -0500
Looks like you are attacking the Netflix challenge. Haven't they closed it? ;)
You can cast this as a -gllamm- problem, but with thousands of
raters/items it won't be very fast. Otherwise, you can program this
with -moptimize-, although you would have to figure out the
appropriate scaling of your latents.
On Tue, Oct 5, 2010 at 4:22 PM, Dimitriy V. Masterov <[email protected]> wrote:
> Suppose I observe lots of consumer ratings (explicit or implicit) of
> thousands of items, which may themselves be combinations of other
> items. The rating matrix is pretty sparse because most consumers only
> try/rate a few of the items. I would like to model how items ratings
> are related to each other for the purpose of making recommendations of
> new items to try. I am thinking of this as a missing rating problem.
> The characteristics of the items are many and are not easily modelled
> with fewer dimensions.
>
> My approach to this problem is to map consumers and items into a joint
> latent factor space of dimension f, so that consumer-item interactions
> are modeled as inner products in that space. Each item i is associated
> with a vector q_i in R^f, which measures the extent to which that item
> possesses the latent factors. The vector p_u in R^f measures the
> interest of the consumer in each of the latent factors.
>
> I would like to model the rating for item i by consumer u as:
>
> r_ui = mu + b_u + b_i + b_u*b_i + q_i'*p_u,
>
> where mu is a constant which is the same for all products, b_u is a
> user fixed effect, b_i is an item fixed effect, and the inner product
> of q and p captures the consumer's overall interest in the item's
> latent characteristics. The fixed effects are meant to capture the
> idea that some items may be more popular and the fact that some users
> may rate more harshly, and that these may interact. For example, a
> popular item my be judged to be especially poor by a harsh critic.
>
> For a given f, I would like to find b_i, b_u, mu, and the vectors q_i
> and p_u to minimize the sum of squared residuals:
>
> sum[(r_ui - mu - b_u - b_i - b_u*b_i - q_i'*p_i)^2] for all items and
> users that are observed.
>
> I would like to use these parameters to estimate the rankings of
> products that have not been sampled by some consumers.
>
> I believe it is possible to estimate these parameters with stochastic
> gradient descent optimization or with alternating least squares. Does
> anyone know if those methods are possible with Mata/Stata or if
> there's a way to recast this problem in another way?
>
> Dimitriy Masterov
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/