What were the command/options you were passing in?
On Jan 18, 2012, at 4:26 PM, John Conwell wrote: > I got latest from Trunk and built it, and when > running SparseVectorsFromSequenceFiles I noticed what I think is a bug. > The SparseVectorsFromSequenceFiles throws an exception when you want term > frequency vectors output, with the maxDFSigma filtering option. > > Basically the if / else if section shown below, will skip > calling DictionaryVectorizer.createTermFrequencyVectors when have that > combination. The condition will create vectors when you want tf vectors > without maxDFSigma filtering, or tfidf vectors with maxDFSigma filtering, > but if you want tf vectors with maxDFSigma filtering, it totally skips over > the call to createTermFrequencyVectors, and later on throws an exception > because the vector input path doesn't exist. > > Is this a known issue? I'm assuming thats not the way its suposed to work, > correct? If so, I think some sort of validation should break the user out > before they start processing anything > > //at line ~267 in trunk > > if (!processIdf && !shouldPrune) { > > DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath, > outputDir, tfDirName, conf, minSupport, maxNGramSize, > > minLLRValue, norm, logNormalize, reduceTasks, chunkSize, > sequentialAccessOutput, namedVectors); > > } else if (processIdf) { > > DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath, > outputDir, tfDirName, conf, minSupport, maxNGramSize, > > minLLRValue, -1.0f, false, reduceTasks, chunkSize, > sequentialAccessOutput, namedVectors); > > } > > -- > > Thanks, > John C > > > > > -- > > -- John C -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com