Yup...will do On Tue, Jan 24, 2012 at 3:46 PM, Grant Ingersoll <gsing...@apache.org>wrote:
> Can you open a JIRA issue, if you haven't already, and mark it for 0.6? > > On Jan 23, 2012, at 10:49 AM, John Conwell wrote: > > > Any time you pass in that you want term frequency vs tfidf used as > > weighting (-wt tf), combined with using maxDFSigma vs maxDFPercent > > (--maxDFSigma 3) will cause the term vectors not to be created (as shown > in > > the code below) > > > > For example, the following cmd line will reproduce this situation: > > > > bin/mahout seq2sparse -i /Users/me/Documents/workspace/mahoutStuff/seq -o > > /Users/me/Documents/workspace/mahoutStuff/termvecs -wt tf --minSupport 2 > > --minDF 2 --maxDFSigma 3 -seq > > > > Thanks, > > John > > > > On Sun, Jan 22, 2012 at 3:00 PM, Grant Ingersoll <gsing...@apache.org > >wrote: > > > >> What were the command/options you were passing in? > >> > >> > >> On Jan 18, 2012, at 4:26 PM, John Conwell wrote: > >> > >>> I got latest from Trunk and built it, and when > >>> running SparseVectorsFromSequenceFiles I noticed what I think is a bug. > >>> The SparseVectorsFromSequenceFiles throws an exception when you want > term > >>> frequency vectors output, with the maxDFSigma filtering option. > >>> > >>> Basically the if / else if section shown below, will skip > >>> calling DictionaryVectorizer.createTermFrequencyVectors when have that > >>> combination. The condition will create vectors when you want tf > vectors > >>> without maxDFSigma filtering, or tfidf vectors with maxDFSigma > filtering, > >>> but if you want tf vectors with maxDFSigma filtering, it totally skips > >> over > >>> the call to createTermFrequencyVectors, and later on throws an > exception > >>> because the vector input path doesn't exist. > >>> > >>> Is this a known issue? I'm assuming thats not the way its suposed to > >> work, > >>> correct? If so, I think some sort of validation should break the user > >> out > >>> before they start processing anything > >>> > >>> //at line ~267 in trunk > >>> > >>> if (!processIdf && !shouldPrune) { > >>> > >>> DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath, > >>> outputDir, tfDirName, conf, minSupport, maxNGramSize, > >>> > >>> minLLRValue, norm, logNormalize, reduceTasks, chunkSize, > >>> sequentialAccessOutput, namedVectors); > >>> > >>> } else if (processIdf) { > >>> > >>> DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath, > >>> outputDir, tfDirName, conf, minSupport, maxNGramSize, > >>> > >>> minLLRValue, -1.0f, false, reduceTasks, chunkSize, > >>> sequentialAccessOutput, namedVectors); > >>> > >>> } > >>> > >>> -- > >>> > >>> Thanks, > >>> John C > >>> > >>> > >>> > >>> > >>> -- > >>> > >>> -- John C > >> > >> -------------------------------------------- > >> Grant Ingersoll > >> http://www.lucidimagination.com > >> > >> > >> > >> > > > > > > -- > > > > Thanks, > > John C > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > > > -- Thanks, John C