Here it is: https://github.com/scikit-learn/scikit-learn/pull/2004
E/
2013/5/24 Eustache DIEMERT
> I've come up with a first version in a binary setting.
>
> Here is the main file :
> https://github.com/oddskool/scikit-learn/blob/out-of-core-examples/examples/out_of_core_classification.py
>
>
I've come up with a first version in a binary setting.
Here is the main file :
https://github.com/oddskool/scikit-learn/blob/out-of-core-examples/examples/out_of_core_classification.py
Could one of the regular commiters take a look at it before I submit a PR ?
Thanks !
Eustache
PS: parser code
2013/5/23 Eustache DIEMERT :
> Mmm well yes, the NLTK distribution is an option too. In the meantime I've
> written the SGML parser which is not so complicated.
>
> My current issue with Reuters is that it's a multilabel classification task.
>
> I'm not sure we have currently an online (minibatch)
Mmm well yes, the NLTK distribution is an option too. In the meantime I've
written the SGML parser which is not so complicated.
My current issue with Reuters is that it's a multilabel classification task.
I'm not sure we have currently an online (minibatch) + multilabel
classifier ?
E/
2013/5/
Or you can use plain text from NLTK...?
http://nltk.github.com/nltk_data/packages/corpora/reuters.zip
On Thu, May 23, 2013 at 10:35 PM, Eustache DIEMERT wrote:
> Right, there are indeed Reuters versions for download at the UCI [1].
>
> I'm building a thorough example that can be self-contained i
Right, there are indeed Reuters versions for download at the UCI [1].
I'm building a thorough example that can be self-contained inc. download
etc.
It will be a bit long and I'm not sure it would be suitable for learning
purposes (extracting data from SGML seems out of scope).
Anyway I'll give i
2013/5/23 Eustache DIEMERT :
> Lars, I love the idea, that would be much closer to the use case in the real
> world.
>
> Could you suggest a relevant dataset ?
I was thinking more along the lines of a utility script than a canned
example, but I think some versions of the Reuters corpus are availab
Lars, I love the idea, that would be much closer to the use case in the
real world.
Could you suggest a relevant dataset ?
Eustache
2013/5/23 Lars Buitinck
> 2013/5/23 Gael Varoquaux :
> > On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote:
> >> 2) of interest
> >
> > If you are
2013/5/23 Gael Varoquaux :
> On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote:
>> 2) of interest
>
> If you are willing to put some effort to make it easy to follow for
> people, and to add a plot at the end (for instance showing some test-set
> error rate decreasing) I think that i
On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote:
> 2) of interest
> I might refine it and send a PR to the examples section.
If you are willing to put some effort to make it easy to follow for
people, and to add a plot at the end (for instance showing some test-set
error rate dec
10 matches
Mail list logo