Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Lars Buitinck
2014-08-23 20:41 GMT+02:00 Gael Varoquaux : > Interesting discussion. Of course, the danger here is that it might be > borderline for the scope of scikit-learn. In case somebody is going to > docstringdo a PR on these topics, I would advise to work on the docstring > and narrative documentation to

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Gael Varoquaux
Hey there, Interesting discussion. Of course, the danger here is that it might be borderline for the scope of scikit-learn. In case somebody is going to docstringdo a PR on these topics, I would advise to work on the docstring and narrative documentation to explain well why this can be useful not

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Joel Nothman
I agree with Vlad that delta-IDF is interesting; but it is not well supported by the community, and I'm not sure it is worth including ... yet. As Lars points out (and as you suggest), there are other ways to supervise feature weighting. I agree this has to be a separate transformer (SupervisedTFID

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Lars Buitinck
2014-08-23 15:44 GMT+02:00 Pavel Soriano : > I don't know if this would be helpful to anybody or if this was already > discussed. That is why I am asking if it is worthy to be pull requested. > Gist URL : > https://gist.github.com/psorianom/0b9d8a742fe0efe0fe82 Yes! BM25 is high on my wishlist. I

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Vlad Niculae
Hi Pavel, First of all, this is an interesting subject, thanks for bringing it up! I fear that it's too domain-specific to go very deep in this direction. That being said, and trying to interpret your benchmarks, it seems that Delta-idf might actually be interesting. Or, more generally, the idea o

[Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Pavel Soriano
Greetings scikit, Last year I used delta idf and bm25 text weighting schemes with scikit classifiers for an opinion classification task. Today I decided to clean them and recheck them in order to propose it to scikit-learn text feature extractors. I only implemented delta idf and bm25 tf and delt

[Scikit-learn-general] partial-fit in gradient boosting

2014-08-23 Thread Mahendra Kariya
Hello All, I have a 12G dataset on which I want to run GradientBoostingRegressor. But loading such a large dataset in memory is practically impossible. I can load it in chunks and train the model in batch mode, but I don't see any partial_fit method in gradient boosting. Is there any other way