Suneel:

thanks for the reply (sorry my gmail somehow put the reply into archive so
it didn't show up in my inbox)


the dictionary seems ok, at least not empty.

-sh-3.2$ ls -l  sparse/
total 464
drwxr-xr-x 2 yyang15 gid-yyang15  32768 Jan  8 15:17 df-count
-rw-r--r-- 1 yyang15 gid-yyang15 203369 Jan  8 15:17 dictionary.file-0
-rw-r--r-- 1 yyang15 gid-yyang15 186893 Jan  8 15:17 frequency.file-0
drwxr-xr-x 2 yyang15 gid-yyang15   4096 Jan  8 15:17 tf-vectors
drwxr-xr-x 2 yyang15 gid-yyang15   4096 Jan  8 15:17 tokenized-documents
drwxr-xr-x 2 yyang15 gid-yyang15  32768 Jan  8 15:18 wordcount


-sh-3.2$ bin/mahout seqdumper -i MAHOUT/sparse/dictionary.file-0

Key: containing: Value: 9229
Key: craft: Value: 9230
Key: e33494add68d3d0138c45300f0aa361a: Value: 9231
Key: elizabeth: Value: 9232
Key: extra: Value: 9233
Key: joe: Value: 9234
Key: juice: Value: 9235
Key: mario's: Value: 9236
Key: musical: Value: 9237
Key: nicest: Value: 9238
Key: petit_ermitage.html: Value: 9239
Key: rebeccabarker: Value: 9240
Key: spa's: Value: 9241
Key: steam: Value: 9242
Key: stylesheet: Value: 9243
Key: tim46679: Value: 9244
Key: topnav.search_where: Value: 9245
Key: www.expedia.com: Value: 9246
Key: xv: Value: 9247
Count: 9248
14/01/13 17:35:39 INFO driver.MahoutDriver: Program took 54565 ms (Minutes:
0.9094166666666667)



On Thu, Jan 9, 2014 at 4:12 PM, Suneel Marthi <suneel_mar...@yahoo.com>wrote:

> The issue seems to be with ur dictionary. What is the length of dictionary?
>
>
>
>
>
> On Thursday, January 9, 2014 6:49 PM, Yang <teddyyyy...@gmail.com> wrote:
>
> I am trying to run the lda (now called cvb) function, I followed the steps
> listed in many online sources. the final step after getting the lda result,
> to show the result in a human-readable form is doing this vectordump, but
> it gave me the following exception:
>
> I also listed the first few bytes of my cvb output file, looks to be at
> least not empty.
>
> Thanks!
> yang
>
> sh-3.2$   bin/mahout vectordump -i MAHOUT/cvb/part-m-00000 --dictionary
> sparse/dictionary.file-0 --dictionaryType sequencefile --vectorSize 10 -o
> cvbout
> Running on hadoop, using /apache/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/yyang15/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar
> 14/01/08 16:37:03 INFO common.AbstractJob: Command line arguments:
> {--dictionary=[sparse/dictionary.file-0], --dictionaryType=[sequencefile],
> --endPhase=[2147483647], --input=[MAHOUT/cvb/part-m-00000],
> --output=[cvbout], --startPhase=[0], --tempDir=[temp], --vectorSize=[10]}
> 14/01/08 16:37:04 INFO vectors.VectorDumper: Sort? false
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
>         at
> org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:132)
>         at
> org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:129)
>         at com.google.common.collect.Iterators$8.next(Iterators.java:812)
>         at
> java.util.AbstractCollection.toArray(AbstractCollection.java:124)
>         at java.util.ArrayList.<init>(ArrayList.java:131)
>         at com.google.common.collect.Lists.newArrayList(Lists.java:119)
>         at
>
> org.apache.mahout.utils.vectors.VectorHelper.toWeightedTerms(VectorHelper.java:128)
>         at
>
> org.apache.mahout.utils.vectors.VectorHelper.vectorToJson(VectorHelper.java:147)
>         at
> org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:240)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:260)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
>
>
> SEQ^F
>
> org.apache.hadoop.io.IntWritable%org.apache.mahout.math.VectorWritable^@^@^@^@^@^@%<D3>NX<97><A9><FD><BB>a;H<98>KȪ<82>^@^A!^G^@^@^@^D^@^@^@^@^C<A0>H=<B6>g<9C>O
>
> <EF>^?<D8>=ˍ<8A><F1>-<AC>8=ɪA+<E0><F1>^R=<AC>-^Ck<BE>^Cm=<F4>p-<E0>ul<D3>=<BA><FE>H7T<F6>^B=<D7>E<EC><95>RH<A7>=<BB>U<DE>^B^Y"<F1>=<D9>WV^F"^P^Q=հ`^?8^N<F1>=<D6>b^YJ
>
> <91><A0>$=<BB><94><F1><C6>^S?c=<B1><BA><88>^G<EB>i^P=<9B>^N>R<92><D2>q=<BA>^H,<9E>^_<B3><91>=<CE><ED>
> i<C1>^FA=<F4>6<9F><A6><BF>^V[=<9F><E8>IN<A4>L<D5>=<B4><E5><F4>j
>
> <83><A0>I=<F4>p<AB>֣%<80>=<A3>'^A<AB><8B>=<A9>=<A4>^V<DB>3<80>^M<B7>=<B5>A^SV^Eͺ?4
>        ^K0^\<9D><BA>=<AA><86>l<8B><F4><E8>m^@^@^@^@^@^@^@^@=<C4>w^NjK<BF>
>    =<AB>"^O;!<E0><F7>=<AF><BC>R<DC>-
>

Reply via email to