2013/5/23 Eustache DIEMERT :
> Mmm well yes, the NLTK distribution is an option too. In the meantime I've
> written the SGML parser which is not so complicated.
>
> My current issue with Reuters is that it's a multilabel classification task.
>
> I'm not sure we have currently an online (minibatch)
Mmm well yes, the NLTK distribution is an option too. In the meantime I've
written the SGML parser which is not so complicated.
My current issue with Reuters is that it's a multilabel classification task.
I'm not sure we have currently an online (minibatch) + multilabel
classifier ?
E/
2013/5/
Or you can use plain text from NLTK...?
http://nltk.github.com/nltk_data/packages/corpora/reuters.zip
On Thu, May 23, 2013 at 10:35 PM, Eustache DIEMERT wrote:
> Right, there are indeed Reuters versions for download at the UCI [1].
>
> I'm building a thorough example that can be self-contained i
Right, there are indeed Reuters versions for download at the UCI [1].
I'm building a thorough example that can be self-contained inc. download
etc.
It will be a bit long and I'm not sure it would be suitable for learning
purposes (extracting data from SGML seems out of scope).
Anyway I'll give i
Hi Ken,
I share and understand your concerns about the rigidity of the current
implementation.
> I like using Extremely Randomized Trees, but I'm looking for more flexibility
> in generating them. In particular, I'd like to be able to specify my own
> criterion and split finding algorithm. I'm
2013/5/23 Eustache DIEMERT :
> Lars, I love the idea, that would be much closer to the use case in the real
> world.
>
> Could you suggest a relevant dataset ?
I was thinking more along the lines of a utility script than a canned
example, but I think some versions of the Reuters corpus are availab
Lars, I love the idea, that would be much closer to the use case in the
real world.
Could you suggest a relevant dataset ?
Eustache
2013/5/23 Lars Buitinck
> 2013/5/23 Gael Varoquaux :
> > On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote:
> >> 2) of interest
> >
> > If you are
2013/5/23 Gael Varoquaux :
> On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote:
>> 2) of interest
>
> If you are willing to put some effort to make it easy to follow for
> people, and to add a plot at the end (for instance showing some test-set
> error rate decreasing) I think that i
Hi Alexandre,
Sorry for late reply. Just because the last two weeks there was a tutorial
here, and I did not spend time for trying the new multi-cores. After this
week, I will back to work and let you know soon.
By the way, thank you for your sharing :)
On Thu, May 23, 2013 at 10:15 AM, Alexand
On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote:
> 2) of interest
> I might refine it and send a PR to the examples section.
If you are willing to put some effort to make it easy to follow for
people, and to add a plot at the end (for instance showing some test-set
error rate dec
Hi Bao,
I haven't heard from you so I guess that it is working. FYI, I opened a PR
for this feature here :
https://github.com/scikit-learn/scikit-learn/pull/1976
Alexandre.
On Fri, May 10, 2013 at 6:26 PM, Bao Thien wrote:
> Hi Alexandre,
>
> It sounds very great. I will try it and let you kn
Hi list,
I was looking for an example of out-of-core learning from a text stream
with sklearn. I think it's a pretty common need nowadays and we should have
a ready-made example of some sort.
I found a sketch in Olivier's post on SO [1], but it doesn't work out of
the box.
So I worked a standalo
12 matches
Mail list logo