Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-27 Thread Eustache DIEMERT
Here it is: https://github.com/scikit-learn/scikit-learn/pull/2004 E/ 2013/5/24 Eustache DIEMERT > I've come up with a first version in a binary setting. > > Here is the main file : > https://github.com/oddskool/scikit-learn/blob/out-of-core-examples/examples/out_of_core_classification.py > >

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-24 Thread Eustache DIEMERT
I've come up with a first version in a binary setting. Here is the main file : https://github.com/oddskool/scikit-learn/blob/out-of-core-examples/examples/out_of_core_classification.py Could one of the regular commiters take a look at it before I submit a PR ? Thanks ! Eustache PS: parser code

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Lars Buitinck
2013/5/23 Eustache DIEMERT : > Mmm well yes, the NLTK distribution is an option too. In the meantime I've > written the SGML parser which is not so complicated. > > My current issue with Reuters is that it's a multilabel classification task. > > I'm not sure we have currently an online (minibatch)

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Eustache DIEMERT
Mmm well yes, the NLTK distribution is an option too. In the meantime I've written the SGML parser which is not so complicated. My current issue with Reuters is that it's a multilabel classification task. I'm not sure we have currently an online (minibatch) + multilabel classifier ? E/ 2013/5/

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Joel Nothman
Or you can use plain text from NLTK...? http://nltk.github.com/nltk_data/packages/corpora/reuters.zip On Thu, May 23, 2013 at 10:35 PM, Eustache DIEMERT wrote: > Right, there are indeed Reuters versions for download at the UCI [1]. > > I'm building a thorough example that can be self-contained i

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Eustache DIEMERT
Right, there are indeed Reuters versions for download at the UCI [1]. I'm building a thorough example that can be self-contained inc. download etc. It will be a bit long and I'm not sure it would be suitable for learning purposes (extracting data from SGML seems out of scope). Anyway I'll give i

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Lars Buitinck
2013/5/23 Eustache DIEMERT : > Lars, I love the idea, that would be much closer to the use case in the real > world. > > Could you suggest a relevant dataset ? I was thinking more along the lines of a utility script than a canned example, but I think some versions of the Reuters corpus are availab

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Eustache DIEMERT
Lars, I love the idea, that would be much closer to the use case in the real world. Could you suggest a relevant dataset ? Eustache 2013/5/23 Lars Buitinck > 2013/5/23 Gael Varoquaux : > > On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote: > >> 2) of interest > > > > If you are

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Lars Buitinck
2013/5/23 Gael Varoquaux : > On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote: >> 2) of interest > > If you are willing to put some effort to make it easy to follow for > people, and to add a plot at the end (for instance showing some test-set > error rate decreasing) I think that i

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Gael Varoquaux
On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote: > 2) of interest > I might refine it and send a PR to the examples section. If you are willing to put some effort to make it easy to follow for people, and to add a plot at the end (for instance showing some test-set error rate dec