I wonder if the the code isn't handling the option processing correctly. I seem to recall some funkiness recently when "--" was prefixed when going to the argMap.
A patch to use getOption("maxRed"), etc. would likely fix the problem. On Dec 23, 2011, at 1:11 AM, Mat Kelcey wrote: > Hello! > > When I run ... > > mahout org.apache.mahout.vectorizer.collocations.llr.CollocDriver \ > -i /user/hadoop/url_tokenised_text.test.seqdir.sparse/tokenized-documents \ > -o /user/hadoop/url_tokenised_text.test.llr__without_preprocess \ > -a org.apache.mahout.vectorizer.DefaultAnalyzer \ > --maxNGramSize 3 --minSupport 100 --maxRed 400 > > ...things work for me but I notice in the first > CollocDriver.generateCollocations pass ( the generation of subgrams ) > _everything_ is going to the one reducer. > > The end result of the run is > > drwxr-xr-x - hadoop supergroup 0 2011-12-23 05:58 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams > -rw-r--r-- 3 hadoop supergroup 0 2011-12-23 05:58 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/_SUCCESS > -rw-r--r-- 3 hadoop supergroup 9509097 2011-12-23 05:56 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00000 > -rw-r--r-- 3 hadoop supergroup 9523286 2011-12-23 05:56 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00001 > -rw-r--r-- 3 hadoop supergroup 9517959 2011-12-23 05:56 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00002 > -- SNIP -- > -rw-r--r-- 3 hadoop supergroup 9557408 2011-12-23 05:57 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00397 > -rw-r--r-- 3 hadoop supergroup 9530757 2011-12-23 05:57 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00398 > -rw-r--r-- 3 hadoop supergroup 9502667 2011-12-23 05:58 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/ngrams/part-r-00399 > drwxr-xr-x - hadoop supergroup 0 2011-12-23 05:55 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams > -rw-r--r-- 3 hadoop supergroup 0 2011-12-23 05:55 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/_SUCCESS > -rw-r--r-- 3 hadoop supergroup 128 2011-12-23 04:40 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00000 > -rw-r--r-- 3 hadoop supergroup 9998117111 2011-12-23 04:45 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00001 > -rw-r--r-- 3 hadoop supergroup 128 2011-12-23 04:40 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00002 > -rw-r--r-- 3 hadoop supergroup 128 2011-12-23 04:40 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00003 > -- SNIP -- > -rw-r--r-- 3 hadoop supergroup 128 2011-12-23 04:40 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00397 > -rw-r--r-- 3 hadoop supergroup 128 2011-12-23 04:40 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00398 > -rw-r--r-- 3 hadoop supergroup 128 2011-12-23 04:40 > /user/hadoop/url_tokenised_text.test.llr__without_preprocess/subgrams/part-r-00399 > > Before I start to poke around does anyone agree this looks wrong? > > I'm running a 0.6-SNAPSHOT I cloned today from github. Was considering > trying 0.5 but a quick look at recent changes doesn't seem to suggest this > code has changed in awhile... > > Cheers, > Mat -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com