I wonder if the  the code isn't handling the option processing correctly.  I 
seem to recall some funkiness recently when "--" was prefixed when going to the 
argMap.

A patch to use getOption("maxRed"), etc. would likely fix the problem.


On Dec 23, 2011, at 1:11 AM, Mat Kelcey wrote:

> Hello!
> 
> When I run ...
> 
> mahout org.apache.mahout.vectorizer.collocations.llr.CollocDriver \
> -i /user/hadoop/url_tokenised_text.test.seqdir.sparse/tokenized-documents \
> -o /user/hadoop/url_tokenised_text.test.llr__without_preprocess \
> -a org.apache.mahout.vectorizer.DefaultAnalyzer \
> --maxNGramSize 3 --minSupport 100 --maxRed 400
> 
> ...things work for me but I notice in the first
> CollocDriver.generateCollocations pass ( the generation of subgrams )
> _everything_ is going to the one reducer.
> 
> The end result of the run is
> 
> drwxr-xr-x   - hadoop supergroup          0 2011-12-23 05:58
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams
> -rw-r--r--   3 hadoop supergroup          0 2011-12-23 05:58
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/_SUCCESS
> -rw-r--r--   3 hadoop supergroup    9509097 2011-12-23 05:56
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00000
> -rw-r--r--   3 hadoop supergroup    9523286 2011-12-23 05:56
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00001
> -rw-r--r--   3 hadoop supergroup    9517959 2011-12-23 05:56
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00002
> -- SNIP --
> -rw-r--r--   3 hadoop supergroup    9557408 2011-12-23 05:57
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00397
> -rw-r--r--   3 hadoop supergroup    9530757 2011-12-23 05:57
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00398
> -rw-r--r--   3 hadoop supergroup    9502667 2011-12-23 05:58
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00399
> drwxr-xr-x   - hadoop supergroup          0 2011-12-23 05:55
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams
> -rw-r--r--   3 hadoop supergroup          0 2011-12-23 05:55
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/_SUCCESS
> -rw-r--r--   3 hadoop supergroup        128 2011-12-23 04:40
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00000
> -rw-r--r--   3 hadoop supergroup 9998117111 2011-12-23 04:45
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00001
> -rw-r--r--   3 hadoop supergroup        128 2011-12-23 04:40
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00002
> -rw-r--r--   3 hadoop supergroup        128 2011-12-23 04:40
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00003
> -- SNIP --
> -rw-r--r--   3 hadoop supergroup        128 2011-12-23 04:40
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00397
> -rw-r--r--   3 hadoop supergroup        128 2011-12-23 04:40
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00398
> -rw-r--r--   3 hadoop supergroup        128 2011-12-23 04:40
> /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00399
> 
> Before I start to poke around does anyone agree this looks wrong?
> 
> I'm running a 0.6-SNAPSHOT I cloned today from github. Was considering
> trying 0.5 but a quick look at recent changes doesn't seem to suggest this
> code has changed in awhile...
> 
> Cheers,
> Mat

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com



Reply via email to