[GSoC] Proposal to implement Distributed SVD++ Recommender using Hadoop
-----------------------------------------------------------------------

                 Key: MAHOUT-371
                 URL: https://issues.apache.org/jira/browse/MAHOUT-371
             Project: Mahout
          Issue Type: New Feature
          Components: Collaborative Filtering
            Reporter: Richard Simon Just



*****Basic Proposal - just to let you know what I have in mind. Will add more 
detail as to actual implementation and some background information about myself 
later today*****


Title: Proposal to implement Distributed SVD++ Recommender using Hadoop 
[adresses MAHOUT-329]

Student: Richard Simon Just 

Basic Proposal: 

During the Netflix Prize Challenge one of the most popular forms of Recommender 
algorithm was that of Matrix Factorisation, in particular Singular Value 
Decomposition (SVD). As such this proposal looks to implement a distributed 
version of one of the most successful SVD-based recommender algorithms from the 
Netflix competition. Namely, the SVD++ algorithm. 

The SVD++ improves upon other basic SVD algorithms by incorporating implicit 
feedback[1]. That is to say that it is able to take into account not just 
explicit user preferences, but also feedback such as, in the case of a company 
like Netflix, whether a movie has been rented. Implicit feedback assumes that 
the fact of there being some correlation between the user and the item is more 
important that whether the correlation is positive or negative. Implicit 
feedback would account for an item has being rated, but not what the rating was.

The implementation will include testing, in-depth documentation and a 
demo/tutorial. If there is time, I will also look to developing the algorithm 
into the timeSVD++ algorithm[2]. The timeSVD++ further improves the results of 
the SVD algorithm by taking into account temporal dynamics. Temporal dynamics 
addresses the way user preferences in items and their behaviour in how they 
rate items can change over time. According to [2] the gains in accuracy 
implementing timeSVD++ are significantly bigger than the gains going from SVD 
to SVD++. 

The overall project will provide three deliverables:
     1. The basic framework for distributed SVD-based recommender
     2. A distributed SVD++ implementation
     3. A distributed timeSVD++ 



Timeline:


The Warm Up/Bonding Period (<=May 23rd):
- familiarise myself further with Mahout and Hadoop's code base and 
documentation
- discuss with community the proposal, design and implementation as well as 
related code tests, optimisations and documentation they would like to see 
incorporated into the project
- build a more detailed design of algorithm implementation and tweak timeline 
based on feedback
- familiarise myself more with unit testing
- finish building 3-4 node Hadoop cluster and play with all the examples

Week 1 (May 24th-30th):
- start writing the back bone of the code in the form of comments and skeleton 
code
- implement SVDppRecommenderJob
- start to integrate DistributedLanzcosSolver

Week 2(May 31st - June 6th):
- complete DistributedLanzcosSolver integration
- start implementing distributed training, prediction and regularisation

Week 3 - 5(June 7th - 27th):
- complete implementation of distributed training, prediction and regularisation
- work on testing, cleaning up code, and tying up any loose documentation ends
- work on any documentation, tests and optimisation requested by community
- Deliverable : basic framework for distributed SVD-based recommender

Week 6 - 7(June 28th-July 11th):
- start implementation of SVD++ (keeping documentation and tests up-to-date)
- prepare demo

Week 8(July 12th - 18th): Mid-Term Report by the 16th
- complete SVD++ and iron out bugs
- implement and document demo
- write wiki articles and tutorial for what has been implemented including the 
demo

Week 9(July 19th - 25th):
- work on any documentation, tests and optimisation requested by community 
during project
- work on testing, cleaning up code, and tying up any loose documentation ends
- Deliverable : Distributed SVD++ Recommender (including Demo)

Week 10 - 11(July 26th - Aug 8th):
- incorporate temporal dynamics
- write temporal dynamics documentation, including wiki article

Week 12(Aug 9th - 15th):Suggested Pencils Down
- last optimisation and tidy up of code, documentation, tests and demo
- discuss with community what comes next, consider what JIRA issues to 
contribute to 
- Deliverable: Distributed SVD++ Recommender with temporal dynamics

Final Evaluations Hand-in: Aug 16th-20th. 


References:

[1] - Y. Koren, "Factorization Meets the Neighborhood: a Mulitfaceted 
Collaborative Filtering Model", ACM Press, 2008, 
http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf
[2] - Y. Koren, "Collaborative Filtering with temporal Dynamics", ACM Press, 
2009, http://research.yahoo.com/files/kdd-fp074-koren.pdf


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to