subject:"\[scikit\-learn\] Adding BM25 to scikit\-learn.feature_extraction.text"

Re: [scikit-learn] Adding BM25 to scikit-learn.feature_extraction.text

2016-07-01 Thread Vlad Niculae

Hi Basil, If B were just a constant, you could do the whole thing as a vectorized operation on X.data. Since I understand B is a n_samples vector, I think the cleanest way to compute the denominator is using sklearn.utils.sparsefuncs.inplace_row_scale. Hope this helps, Vlad On July 1, 2016

[scikit-learn] Adding BM25 to scikit-learn.feature_extraction.text

2016-07-01 Thread Basil Beirouti

Hi everyone, to put it succinctly, here's the BM25 equation: f(w,D) * (k+1) / (k*B + f(w,D)) where w is the word, and D is the document (corresponding to rows and columns, respectively). f is a sparse matrix because only a fraction of the whole vocabulary of words appears in any given single doc