Yup...will do

On Tue, Jan 24, 2012 at 3:46 PM, Grant Ingersoll <gsing...@apache.org>wrote:

> Can you open a JIRA issue, if you haven't already, and mark it for 0.6?
>
> On Jan 23, 2012, at 10:49 AM, John Conwell wrote:
>
> > Any time you pass in that you want term frequency vs tfidf used as
> > weighting (-wt tf), combined with using maxDFSigma vs maxDFPercent
> > (--maxDFSigma 3) will cause the term vectors not to be created (as shown
> in
> > the code below)
> >
> > For example, the following cmd line will reproduce this situation:
> >
> > bin/mahout seq2sparse -i /Users/me/Documents/workspace/mahoutStuff/seq -o
> > /Users/me/Documents/workspace/mahoutStuff/termvecs -wt tf --minSupport 2
> > --minDF 2 --maxDFSigma 3 -seq
> >
> > Thanks,
> > John
> >
> > On Sun, Jan 22, 2012 at 3:00 PM, Grant Ingersoll <gsing...@apache.org
> >wrote:
> >
> >> What were the command/options you were passing in?
> >>
> >>
> >> On Jan 18, 2012, at 4:26 PM, John Conwell wrote:
> >>
> >>> I got latest from Trunk and built it, and when
> >>> running SparseVectorsFromSequenceFiles I noticed what I think is a bug.
> >>> The SparseVectorsFromSequenceFiles throws an exception when you want
> term
> >>> frequency vectors output, with the maxDFSigma filtering option.
> >>>
> >>> Basically the if / else if section shown below, will skip
> >>> calling DictionaryVectorizer.createTermFrequencyVectors when have that
> >>> combination.  The condition will create vectors when you want tf
> vectors
> >>> without maxDFSigma filtering, or tfidf vectors with maxDFSigma
> filtering,
> >>> but if you want tf vectors with maxDFSigma filtering, it totally skips
> >> over
> >>> the call to createTermFrequencyVectors, and later on throws an
> exception
> >>> because the vector input path doesn't exist.
> >>>
> >>> Is this a known issue?  I'm assuming thats not the way its suposed to
> >> work,
> >>> correct?  If so, I think some sort of validation should break the user
> >> out
> >>> before they start processing anything
> >>>
> >>> //at line ~267 in trunk
> >>>
> >>> if (!processIdf && !shouldPrune) {
> >>>
> >>>       DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,
> >>> outputDir, tfDirName, conf, minSupport, maxNGramSize,
> >>>
> >>>         minLLRValue, norm, logNormalize, reduceTasks, chunkSize,
> >>> sequentialAccessOutput, namedVectors);
> >>>
> >>> } else if (processIdf) {
> >>>
> >>>       DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,
> >>> outputDir, tfDirName, conf, minSupport, maxNGramSize,
> >>>
> >>>         minLLRValue, -1.0f, false, reduceTasks, chunkSize,
> >>> sequentialAccessOutput, namedVectors);
> >>>
> >>> }
> >>>
> >>> --
> >>>
> >>> Thanks,
> >>> John C
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -- John C
> >>
> >> --------------------------------------------
> >> Grant Ingersoll
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >>
> >
> >
> > --
> >
> > Thanks,
> > John C
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>
>


-- 

Thanks,
John C

Reply via email to