Re: how idf is calculated

2014-10-31 Thread Andrejs Abele
I found my problem. I assumed based on TF-IDF in Wikipedia , that log base 10 is used, but as I found in this discussion https://groups.google.com/forum/#!topic/scala-language/K5tbYSYqQc8, in scala it is actually ln (natural logarithm). Regards, Andrejs On Thu, Oct 30, 2014 at 10:49 PM, Ashic

Re: how idf is calculated

2014-10-31 Thread Sean Owen
Yes, here the base doesn't matter as it just multiplies all results by a constant factor. Math libraries tend to have ln, not log10 or log2. ln is often the more, er, natural base for several computations. So I would assume that log = ln in the context of ML. On Fri, Oct 31, 2014 at 11:31 AM,

RE: how idf is calculated

2014-10-30 Thread Ashic Mahtab
Hi Andrejs,The calculations are a bit different to what I've come across in Mining Massive Datasets (2nd Ed. Ullman et. al., Cambridge Press) available here:http://www.mmds.org/ Their calculation of IDF is as follows: IDFi = log2(N / ni) where N is the number of documents and ni is the number