On Sep 20, 2012, at 1:55 PM, Dave Byrne wrote:

> In TFIDFPartialVectorReducer.java:
> 
> If docFreq > maxDocFreq then the vector at that index is not set (ignored)
> If docFreq < minDocFreq then the vector at that index is set to the TfIdf 
> calculation using minDocFreq instead of the actual document frequency.
> 
> Should minDocFreq not be treated the same as maxDocFreq by skipping setting 
> the vector at that index?

I think the idea is that it is being rounded up to provide some minimum level 
of input.  It's always a bit of a hedge w/ these rare terms.  Sometimes they 
are just garbage, other times, they are valuable.  My leaning would be towards 
keeping it as is.

> 
> In both cases, the vector length remains the same and these settings have no 
> effect on pruning the vector length / term reduction?
> 
> 
> NOTICE: This message and any attachments are intended only for the use of the 
> addressee and may contain confidential, proprietary and/or privileged 
> information. If you are not the intended recipient, any review, use, 
> distribution, dissemination or copying of this email is prohibited. If you 
> have received this email in error, please notify the sender by replying to 
> this message and delete this email immediately. Securities trading, account 
> management, and investment banking services are offered by MDB Capital Group 
> LLC, a registered broker-dealer and member of FINRA and SIPC. Unless clearly 
> stated, nothing herein shall be construed to be an offer to sell, nor a 
> solicitation of an offer to buy, any financial product.

--------------------------------------------
Grant Ingersoll
http://www.lucidworks.com




Reply via email to