I found my problem. I assumed based on TF-IDF in Wikipedia , that log base
10 is used, but as I found in this discussion
https://groups.google.com/forum/#!topic/scala-language/K5tbYSYqQc8, in
scala it is actually ln (natural logarithm).
Regards,
Andrejs
On Thu, Oct 30, 2014 at 10:49 PM, Ashic
Yes, here the base doesn't matter as it just multiplies all results by
a constant factor. Math libraries tend to have ln, not log10 or log2.
ln is often the more, er, natural base for several computations. So I
would assume that log = ln in the context of ML.
On Fri, Oct 31, 2014 at 11:31 AM,
Hi Andrejs,The calculations are a bit different to what I've come across in
Mining Massive Datasets (2nd Ed. Ullman et. al., Cambridge Press) available
here:http://www.mmds.org/
Their calculation of IDF is as follows:
IDFi = log2(N / ni)
where N is the number of documents and ni is the number