If you're supplying a dictionary file (as you are), I'd suggest not
specifying the "-nt 90000" option - you're apparently specifying a numTerms
less than the actual number of terms in some of your vectors.  If you
supply the -dict option, it'll infer the number of terms from reading the
dictionary, and you don't need to specify it.


On Wed, Jul 31, 2013 at 7:02 AM, Marco <zentrop...@yahoo.co.uk> wrote:

> oops! that did the trick.
>
> nonetheless i think the fact that you have to do "rowid" and generate the
> matrix should be added to the wiki.
>
> after waiting for more than an hour i got and error on
> Writing final document/topic inference from lda/matrix/matrix to
> jojoba/do-output
>
> the error is : org.apache.mahout.math.IndexException: Index 90011 is
> outside allowable range of [0,90000)
>
> Here is how I launched it:
> mahout cvb -i jojoba/matrix/matrix -dict jojoba/vectors/dictionary.file-0
> -o jojoba/to-output -dt jojoba/do-output -k 190 -nt 90000 -mt jojoba/mt
> --maxIter 2 -mipd 1 -a 0.01 -e 0.01 -seed 37 -block 1
>
> weird thing is also that the job described as " Writing final topic/term
> distributions from jojoba/mt/model-2 to jojoba/to-output" run successfully
> but if i now do a vectodump i always get a Java Heaps Space error
>
>
>
> ________________________________
>  Da: Suneel Marthi <suneel_mar...@yahoo.com>
> A: "user@mahout.apache.org" <user@mahout.apache.org>; Marco <
> zentrop...@yahoo.co.uk>
> Inviato: Mercoledì 31 Luglio 2013 11:01
> Oggetto: Re: Latent Dirichlet Allocatio (cvb)
>
>
> RowId job creates a matrix (IntWritable, VectorWritable) and a docIndex
> (IntWritable, Text).
>
> So you should be seeing 2 files generated -  jojoba/matrix/matrix and
> jojoba/matrix/docIndex.
>
> Seems like you have been feeding docIndex as input to cvb which would
> cause this exception,  its the matrix that needs to be fed as input to cvb.
>
> So the input to vb needs to be "jojoba/matrix/matrix".
>
> Give that a try and let us know.
>
>
>
>
> ________________________________
> From: Marco <zentrop...@yahoo.co.uk>
> To: "user@mahout.apache.org" <user@mahout.apache.org>
> Sent: Wednesday, July 31, 2013 4:20 AM
> Subject: Latent Dirichlet Allocatio (cvb)
>
>
> Hi, I'm new here so forgive my little experience with Mahout.
>
> We're trying to use Mahout (on our hadoop cluster) for calculating topics
> on almost 14000 documents.
>
> I've been following this wiki page (http://goo.gl/DcPVjB) but still
> getting errors.
>
> Here's what I'm doing:
>
> 1) creating sequence file from text files (mahout seqdirectory -i
> jojoba/text-files -o jojoba/seqfiles)
> 2) creating vectors FROM sequence files (mahout seq2sparse -i
> jojoba/seqfiles -o jojoba/vectors -wt tf
>  -nv)
> 3) launching CVB like this:
> mahout cvb -i jojoba/vectors/tf-vectors/ -dict
> jojoba/vectors/dictionary.file-0 -o jojoba/to-output -dt jojoba/do-output
> -k 190 -nt 90000 -mt jojoba/mt --maxIter 2 -mipd 1 -a 0.01 -e 0.01 -seed 37
> -block 1
>
> and I get Exception in thread "main" java.lang.InterruptedException:
> Failed to complete iteration 1 stage 1
>
> I later learned here (
> http://stackoverflow.com/questions/14757162/run-cvb-in-mahout-0-8/) that
> I should actually feed cvb a matrix and not the vectors (shouldn't it be
> clearly stated in the wiki?).
> So then I run:
> mahout rowid -i jojoba/vectors/tf-vectors/ -o jojoba/matrix
>
> 3bis) I rerun CVB giving jojoba/matrix as input and I now get
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> org.apache.mahout.math.VectorWritable
>
> What am I missing?
>
> Thanks
>  a lot for your help
>



-- 

  -jake

Reply via email to