Ok, it's just FYI as you build out your pipelines.
FYI there's a bit of inconsistency between DRM-based in methods in mahout.
Some methods require Int row keys, some don't. Yet them some also rely on
names of a NamedVector, and some don't .
PCA/SSVD propagates BOTH keys from sequence file AND nam
Yes but rowId transforms my dataset into an index which associates keys
like 0, 1, 2... to my actual key and a sequence file indexed using these
new keys, as integer.
Then pca/ssvd comes in, outputs a reducted matrix (as a sequence file using
the same keys it found in the input file, which are the
Pca and ssvd propagates exact row keys given in the input. If you give it
text keys, U and Usigma will have text keys. It doesn t change that.
On Mar 10, 2014 3:39 AM, "Kevin Moulart" wrote:
> Hi and thanks, I'll try that, but I'd like to do so using a mapreduce job
> to improve performances.
>
>
Hi and thanks, I'll try that, but I'd like to do so using a mapreduce job
to improve performances.
I'm using PCA as a way to reduce the dimension of the dataset both to
improve its relevance (with 1600+ variables, many of them are correlated)
and to improve the performances of the classification a
On Monday, March 10, 2014 4:21 AM, Kevin Moulart wrote:
Its not clear to me from ur description as to the exact sequence of steps u r
running thru, but an SSVD job requires a matrix as input (not a sequencefile of
.
>When u try running a seqdumper on ur SSVD output do u see anything?
>
> Its not clear to me from ur description as to the exact sequence of steps
> u r running thru, but an SSVD job requires a matrix as input (not a
> sequencefile of .
> When u try running a seqdumper on ur SSVD output do u see anything?
>
I see a Seqence File Text/VectorWritable with my original ke
Its not clear to me from ur description as to the exact sequence of steps u r
running thru, but an SSVD job requires a matrix as input (not a sequencefile of
.
When u try running a seqdumper on ur SSVD output do u see anything?
The next step after u create ur sequencefiles of Vectors would be
Hi again,
I'm now using Mahout 0.9, and I'm trying to use PCA (via the SSVD) to
reduce the dimention of a dataset from 1600+ features to ~100 and then to
use the reducted dataset to train a naive bayes model and test it.
So here is my workflow :
- Transform my CSV into a SequencFile with
key