Yep. That is a problem.
Emit a constant key and a pair containing an integer and a vector. Add them
up separately. Divide at the end. The initial value for the integer from
the mapper should be 1.
On Thu, Apr 28, 2011 at 8:26 PM, Vckay wrote:
> However, the problem that I can see is that t
The payload for the K/V pair includes a counter of how many raw items
that a combiner merged. This is how wordcount works- the combiners
send in the word as key and the count as payload.
Lance
On Thu, Apr 28, 2011 at 8:26 PM, Vckay wrote:
> On Wed, Apr 27, 2011 at 8:41 PM, Ted Dunning wrote:
>
On Wed, Apr 27, 2011 at 8:41 PM, Ted Dunning wrote:
> On Wed, Apr 27, 2011 at 5:28 PM, Vckay wrote:
>
> > Assuming the data is available as a text file with rows representing
> > measurements,
> >
>
>
> A org.apache.mahout.math.hadoop.DistributedRowMatrix is the traditional
> approach to this.
No. This is much better than crazy. It is exactly what LinearOperators are
good for.
On Wed, Apr 27, 2011 at 8:21 PM, Jake Mannix wrote:
> Thinking on it a little bit further, this is not so bad: Let's say we had a
> finished
> patch to the idea discussed in MAHOUT-672 - virtual distributed ma
On Wed, Apr 27, 2011 at 8:21 PM, Jake Mannix wrote:
>
> I would love to know the answer to this question.
>
> Thinking on it a little bit further, this is not so bad: Let's say we had a
> finished
> patch to the idea discussed in MAHOUT-672 - virtual distributed matrices,
> where
> in this case,
On Wed, Apr 27, 2011 at 6:41 PM, Ted Dunning wrote:
>
> > 3. Now that I have the centered data, computing the covariance matrix
> > shouldn't be too hard if I have represented my matrix as a distributed
> row
> > matrix. I can then use "times" to produce the covariance matrix.
> >
>
> Actually, t
On Wed, Apr 27, 2011 at 5:28 PM, Vckay wrote:
> Assuming the data is available as a text file with rows representing
> measurements,
>
A org.apache.mahout.math.hadoop.DistributedRowMatrix is the traditional
approach to this.
> 1. Have a dataCenteringDriver that calls a empiricalMeanGenerator d
Hello all,
I am trying to implement PCA using some of the libraries from Mahout. I am
following the TODO list posted here :
https://issues.apache.org/jira/browse/MAHOUT-512 . I understand conceptually
the idea behind the PCA but I am rather new to both Hadoop and Mahout. Here
is what I think the