Re: Item Based Collaborative Filtering Properties Question

2013-09-12 Thread Darius Miliauskas
Hi, Brian, this question is also relevant for me. Perhaps somebody will give more details because I am just learning myself. But, I guess you can try to change the parameters, and check the performance, and write here about it that everybody would get more knowledge! In general, if these values

Re: ALS-WR Predictions

2013-09-12 Thread Stuart Horsman
So I'm using predictFromFactorization in Mahout 0.5 but this code was removed from 0.6. Is there any special reason for this? Thanks Stuart On 12 September 2013 08:28, Stuart Horsman stuart.hors...@gmail.com wrote: Hi All, I'm new to mahout so thanks up front for the help. I'm running

Re: Item Based Collaborative Filtering Properties Question

2013-09-12 Thread 林伟
Hi Brian * *Miliauskas, I am a data mining engineer form Taobao recommendation team. In past one month, I have read all the code of mahout itemCF. So maybe I can answer this question. We consider the input of itemCF for one user is a item vector, like this (the notation is from Json object

Adding new data points to existing clusters

2013-09-12 Thread Michael Wechner
Hi I have found the following thread about adding new data points/documents to existing clusters, without having to run the clustering again http://lucene.472066.n3.nabble.com/Updating-clusters-td972794.html Grant describes that one possibility is to check which cluster thew data

Using SparseVectorsFromSequenceFiles () in Java

2013-09-12 Thread Darius Miliauskas
Dear All, I am trying to use SparseVectorsFromSequenceFiles () through Java code (NetBeans 7Windows 7) . here is my code (API): //inputPath is the path of my SequenceFile Path inputPath = new Path(C:\\Users\\DARIUS\\forTest1.txt); //outputPath where I expect some results Path outputPath = new

Re: Item Based Collaborative Filtering Properties Question

2013-09-12 Thread Brian Arnold
Hi, Thank you for the response! What you said makes sense. Here is a link to the other property: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java#RecommenderJob.0DEFAULT_MAX_SIMILARITIES_PER_ITEM

Re: Item Based Collaborative Filtering Properties Question

2013-09-12 Thread 林伟
Hi Brain, The parameter maxPrefsPerUserInItemSimilarity is in RecommenderJob, from the text of comment, It is the same as the paramter maxPrefsPerUser in ItemSimilarityJob. The second question is not easy to answer. It is decided by your recommendation scenario and input data features. The

Re: Item Based Collaborative Filtering Properties Question

2013-09-12 Thread Sebastian Schelter
Hi Brian, Happy to give you some details: So, from a matrix A (user x item) that holds user-item interactions, this algorithm first computes a matrix S (item x item) of item similarities and afterwards uses these item similarities to compute recommendations for users. the parameters refer to the

processing compressed files

2013-09-12 Thread Eric
Hi, I'd like to use Mahout for clustering and classification where I have tens of terabytes of data on Amazon's S3 storage service. Each file in my data will generate one data point where I need to decompress the file and process it prior to applying machine learning. Is it necessary to have

Re: Using SparseVectorsFromSequenceFiles () in Java

2013-09-12 Thread Gokhan Capan
Although Windows is not officially supported, your svsf.run(new String[]{inputPath.toString(), outputPath.toString()}) should be svsf.run(new String[]{-i,inputPath.toString(), -o, outputPath.toString()}) anyway. Best Gokhan On Thu, Sep 12, 2013 at 4:14 PM, Darius Miliauskas

Re: mahout detailed output for LDA

2013-09-12 Thread Gokhan Capan
Hi Parnab, When running lda using commandline cvb utility, you may pass -o option for the output path for topic-term distributions, and -dt option for the output path for doc-topic distributions. Hope that helps. Best Gokhan On Wed, Sep 11, 2013 at 11:38 PM, parnab kumar

Re: ALS-WR Predictions

2013-09-12 Thread Stuart Horsman
Hi Stevo, So the method predictFromFactorization, which was in PredictorJob, seems not to have been migrated over. RecommenderJob gives me top N recommendations (plus predicted preferences). predictFromFactorization is handy because I can pass in userid, itemid pairs and get a preference