Hi Basil,
If B were just a constant, you could do the whole thing as a vectorized
operation on X.data.
Since I understand B is a n_samples vector, I think the cleanest way to compute
the denominator is using sklearn.utils.sparsefuncs.inplace_row_scale.
Hope this helps,
Vlad
On July 1, 2016
Hi everyone,
to put it succinctly, here's the BM25 equation:
f(w,D) * (k+1) / (k*B + f(w,D))
where w is the word, and D is the document (corresponding to rows and
columns, respectively). f is a sparse matrix because only a fraction of the
whole vocabulary of words appears in any given single doc