date:20131129

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Ted Dunning

On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian wrote: > Hi Ted, > > Thanks for your response. I thought that the mean of a sparse vector is > simply the mean of the "defined" elements? Why would the vectors become > dense unless you're meaning that all the undefined elements (0?) now will > be (0

Re: Seq2sparse numReducers is not passed to some jobs

2013-11-29 Thread Abbas Gadhia

I have the exact same problem. Some of the longer running sub-tasks take only 1 reducer and each reduce task runs between 2-3 hours !!!

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Amit Nithian

Hi Ted, Thanks for your response. I thought that the mean of a sparse vector is simply the mean of the "defined" elements? Why would the vectors become dense unless you're meaning that all the undefined elements (0?) now will be (0-m_x)? Looking at the following example: X = [5 - 4] and Y= [4 5 2

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Ted Dunning

Well, the best way to compute correlation using sparse vectors is to make sure you keep them sparse. To do that, you must avoid subtracting the mean by expanding whatever formulae you are using. For instance, if you are computing (x - m_x) . (y - m_y) (here . means dot product) If you do t

Re: Question about Pearson Correlation in non-Taste mode

2013-11-29 Thread Amit Nithian

Okay so I rethought my question and realized that the paper never really talked about collaborative filtering but just how to calculate item-item similarity in a scalable fashion. Perhaps this is the reason for why the common ratings aren't used? Because that's not a pre-req for this calculation?

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

2013-11-29 Thread Ted Dunning

If you always insert 1's for each element, then you can detect collisions by inserting all your elements (or all elements in each document separately) and looking for the max value in the vector. If you see something >1, you have a collision. But collisions are actually good. The only way to com

Re: Information

2013-11-29 Thread Angelo Immediata

Hi there I open this old topic since I got some information more becouse I was able in talking with my customer Basically my customer wants the following: by using some historical data, we have to cluster the data by using some cluster analysis and some environment variables; for each cluster we ha

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

2013-11-29 Thread Paul van Hoven

Hi, thanks for your quick reply. So multiple probes are a protection against collisions? After playing a little with the default length of a RandomAccessSparseVector object I noticed that (of course) collisions occur when the length is too short. Therefore, I'm asking myself if there is a possibili

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

2013-11-29 Thread Ted Dunning

The default with the Mahout encoders is two probes. This is unnecessary with the intercept term, of course, if you protect the intercept term from other updates, possible by encoding other data using a view of the original feature vector. For each probe, a different hash is used so each value is

Desicion Tree in Apache Mahout

2013-11-29 Thread unmesha sreeveni

I am new to Mahout. I am trying to run Desicion tree in https://github.com/apache/mahout/tree/mahout-0.6/examples/src/main/java/org/apache/mahout/classifier/df I have gone through https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation I need to run "df" as a eclipse project. So

Re: Mahout fpg

2013-11-29 Thread Isabel Drost-Fromm

On Fri, 22 Nov 2013 17:55:13 +0800 Jason Lee wrote: > I noticed lots of algorithms implementations has deprecated in Mahout > 0.8 and removed in 0.9, but no reasons or comments been marked. Can > i ask why? As Suneel mentioned earlier: Before removing these algorithms we asked on the user list

RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

2013-11-29 Thread Paul van Hoven

For an example program using mahout I use the donut.csv sample data from the project ( https://svn.apache.org/repos/asf/mahout/trunk/examples/src/main/resources/donut.csv ). My code looks like this: import org.apache.mahout.math.RandomAccessSparseVector; import org.apache.mahout.math.Vecto

Re: java.lang.NoClassDefFoundError: com/google/common/base/Preconditions

2013-11-29 Thread Isabel Drost-Fromm

On Thu, 28 Nov 2013 13:24:26 +0530 Tharindu Rusira wrote: > Yes that's the exact issue Suneel, it was a careless mistake while > adding projects to Eclipse that I missed those .jars. When changing Mahout code make sure to either run mvn eclipse:eclipse before importing the project into your wor

Re: Question about Pearson Correlation in non-Taste mode

Re: Seq2sparse numReducers is not passed to some jobs

Re: Question about Pearson Correlation in non-Taste mode

Re: Question about Pearson Correlation in non-Taste mode

Re: Question about Pearson Correlation in non-Taste mode

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

Re: Information

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

Re: RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

Desicion Tree in Apache Mahout

Re: Mahout fpg

RandomAccessSparseVector setting 1.0 in 2 dims for 1 feature value?

Re: java.lang.NoClassDefFoundError: com/google/common/base/Preconditions

13 matches

Site Navigation

Mail list logo

Footer information