Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Ted Dunning
On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian wrote: > Hi Ted, > > Thanks for your response. I thought that the mean of a sparse vector is > simply the mean of the "defined" elements? Why would the vectors become > dense unless you're meaning that all the undefined elements (0?) now will > be (0

Re: Seq2sparse numReducers is not passed to some jobs

2013-11-29 Thread Abbas Gadhia
I have the exact same problem. Some of the longer running sub-tasks take only 1 reducer and each reduce task runs between 2-3 hours !!!

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Amit Nithian
Hi Ted, Thanks for your response. I thought that the mean of a sparse vector is simply the mean of the "defined" elements? Why would the vectors become dense unless you're meaning that all the undefined elements (0?) now will be (0-m_x)? Looking at the following example: X = [5 - 4] and Y= [4 5 2

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Ted Dunning
Well, the best way to compute correlation using sparse vectors is to make sure you keep them sparse. To do that, you must avoid subtracting the mean by expanding whatever formulae you are using. For instance, if you are computing (x - m_x) . (y - m_y) (here . means dot product) If you do t

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Amit Nithian
Okay so I rethought my question and realized that the paper never really talked about collaborative filtering but just how to calculate item-item similarity in a scalable fashion. Perhaps this is the reason for why the common ratings aren't used? Because that's not a pre-req for this calculation?

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

2013-11-29 Thread Ted Dunning
If you always insert 1's for each element, then you can detect collisions by inserting all your elements (or all elements in each document separately) and looking for the max value in the vector. If you see something >1, you have a collision. But collisions are actually good. The only way to com

Re: Information

2013-11-29 Thread Angelo Immediata
Hi there I open this old topic since I got some information more becouse I was able in talking with my customer Basically my customer wants the following: by using some historical data, we have to cluster the data by using some cluster analysis and some environment variables; for each cluster we ha

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

2013-11-29 Thread Paul van Hoven
Hi, thanks for your quick reply. So multiple probes are a protection against collisions? After playing a little with the default length of a RandomAccessSparseVector object I noticed that (of course) collisions occur when the length is too short. Therefore, I'm asking myself if there is a possibili

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

2013-11-29 Thread Ted Dunning
The default with the Mahout encoders is two probes. This is unnecessary with the intercept term, of course, if you protect the intercept term from other updates, possible by encoding other data using a view of the original feature vector. For each probe, a different hash is used so each value is

Desicion Tree in Apache Mahout

2013-11-29 Thread unmesha sreeveni
I am new to Mahout. I am trying to run Desicion tree in https://github.com/apache/mahout/tree/mahout-0.6/examples/src/main/java/org/apache/mahout/classifier/df I have gone through https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation I need to run "df" as a eclipse project. So

Re: Mahout fpg

2013-11-29 Thread Isabel Drost-Fromm
On Fri, 22 Nov 2013 17:55:13 +0800 Jason Lee wrote: > I noticed lots of algorithms implementations has deprecated in Mahout > 0.8 and removed in 0.9, but no reasons or comments been marked. Can > i ask why? As Suneel mentioned earlier: Before removing these algorithms we asked on the user list

RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

2013-11-29 Thread Paul van Hoven
For an example program using mahout I use the donut.csv sample data from the project ( https://svn.apache.org/repos/asf/mahout/trunk/examples/src/main/resources/donut.csv ). My code looks like this: import org.apache.mahout.math.RandomAccessSparseVector; import org.apache.mahout.math.Vecto

Re: java.lang.NoClassDefFoundError: com/google/common/base/Preconditions

2013-11-29 Thread Isabel Drost-Fromm
On Thu, 28 Nov 2013 13:24:26 +0530 Tharindu Rusira wrote: > Yes that's the exact issue Suneel, it was a careless mistake while > adding projects to Eclipse that I missed those .jars. When changing Mahout code make sure to either run mvn eclipse:eclipse before importing the project into your wor