Re: Collaborative filtering help needed

2011-11-09 Thread Akshay Jain
@Sean, I am just testing with a small dataset. I have some large datasets which I am planning to use on Hadoop. Thanks. Akshay On Wed, Nov 9, 2011 at 12:49 PM, Sean Owen sro...@gmail.com wrote: @Steven this is in the distributed part. There is no such method. But Akshay if your data is not

NewsKMeansClustering - the result most people want seems to be missing

2011-11-09 Thread Rob Podolski
Hi Managed to get the Manning Chap 09 example NewsKMeansClustering  working with my own documents.  However, I thought the main point of this was to cluster the news articles together to get groups of similar content.  The example allows you to get the cluster membership in terms of

meanshift clustering

2011-11-09 Thread gaurav redkar
Hi.. I am unable to identify where is the clusterPoints() function in the MeanShiftCanopyClusterer.java file being called during the execution of Meanshift job. What i need to know is where are the files in clusteredPoints n clusters-* directory being written when we run the job on hadoop.

Re: Comparing results of Mahout SVD and Scilab

2011-11-09 Thread Alfredo Motta
Thank you for your clarifications, now it is clear 2011/11/8 Jake Mannix jake.man...@gmail.com The output from the LanczosSolver is not the final set of results. The fact that you passed --cleansvd true to the system means that you want it to do some cleanup and remove any spurious singular

new posting about (machine learning) mapreduce algorithms

2011-11-09 Thread Amund Tveit
Perhaps of interest: http://atbrox.com/2011/11/09/mapreduce-hadoop-algorithms-in-academic-papers-5th-update-%E2%80%93-nov-2011/ Best regards, Amund

AdaptiveLogisticRegression

2011-11-09 Thread Koert Kuipers
To train the AdaptiveLogisticRegression, do i need to feed in new training data only once? Or is iteration over the training data here helpful as well? Thanks! Koert

Issues with running Mahout LDA over the Reuters data set (Mahout in Action)

2011-11-09 Thread Varnit Khanna
Hi, I am trying to run the Mahout LDA over the Reuters data set as described in Mahout in Action however I always get only 1 topic returned. I am running on Mahout 0.5 and here are my steps: $ mvn -e -q exec:java -Dexec.mainClass=org.apache.lucene.benchmark.utils.ExtractReuters

Re: SGD TrainNewsGroups interim output

2011-11-09 Thread Grant Ingersoll
Cool, how about adding it to the Wiki? On Nov 9, 2011, at 8:15 AM, Suneel Marthi wrote: I can put together a doc if we don't already have one, know the SGD code pretty well. Regards, Suneel From: Grant Ingersoll grant.ingers...@gmail.com To:

Re: SGD TrainNewsGroups interim output

2011-11-09 Thread Suneel Marthi
Will do. From: Grant Ingersoll gsing...@apache.org To: user@mahout.apache.org; Suneel Marthi suneel_mar...@yahoo.com Sent: Wednesday, November 9, 2011 10:02 AM Subject: Re: SGD TrainNewsGroups interim output Cool, how about adding it to the Wiki? On Nov 9,

Re: Running Mahout SVD on Amazon Elastic Map Reduce

2011-11-09 Thread Ted Dunning
This looks like a hard-coded hdfs prefix in a path name construction somewhere. On Wed, Nov 9, 2011 at 8:27 AM, motta motta@gmail.com wrote: Hi everybody, I have tried to run my first Mahout SVD Job (DistributedLanczosSolver) in Elastic Map Reduce. Before going to Amazon I've tried to

Re: NewsKMeansClustering - the result most people want seems to be missing

2011-11-09 Thread Grant Ingersoll
On Nov 9, 2011, at 3:17 AM, Rob Podolski wrote: Hi Managed to get the Manning Chap 09 example NewsKMeansClustering working with my own documents. However, I thought the main point of this was to cluster the news articles together to get groups of similar content. The example

User based CF

2011-11-09 Thread WangRamon
Hi All Dose mahout provide a user based CF implementation on Hadoop? Currently i only see an item based hadoop implementations. Thanks. CheersRamon

Re: User based CF

2011-11-09 Thread Sebastian Schelter
There is no such implementation. Literature suggests that an item-based approach is usually both faster and more accurate. --sebastian On 10.11.2011 08:34, WangRamon wrote: Hi All Dose mahout provide a user based CF implementation on Hadoop? Currently i only see an item based hadoop

Re: NewsKMeansClustering - the result most people want seems to be missing

2011-11-09 Thread Rob Podolski
Many thanks.  Actually I delved into the source code and found out that if you set the (undocumented) namedVector boolean to true in...         DictionaryVectorizer.createTermFrequencyVectors(             tokenizedPath,             new Path(OUTPUT_HFS_FOLDER),             conf,            

RE: User based CF

2011-11-09 Thread WangRamon
Thanks Sebastian, can i assume that if there are more items than users, item based CF will be slow. Date: Thu, 10 Nov 2011 08:43:53 +0100 From: s...@apache.org To: user@mahout.apache.org Subject: Re: User based CF There is no such implementation. Literature suggests that an item-based

Re: Running Mahout SVD on Amazon Elastic Map Reduce

2011-11-09 Thread Alfredo Motta
I didn't hard-codec any hdfs prefix, I've just used mahout-examples-0.5-job.jar (downloaded from mahout website) to run DistributedLanczosSolver. The output suggest that the jar invoked FileSystem.get(conf) instead of FileSystem.get(uri, conf) to get my input matrix is it possible? 2011/11/10