Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Lars Buitinck
2013/5/23 Eustache DIEMERT : > Mmm well yes, the NLTK distribution is an option too. In the meantime I've > written the SGML parser which is not so complicated. > > My current issue with Reuters is that it's a multilabel classification task. > > I'm not sure we have currently an online (minibatch)

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Eustache DIEMERT
Mmm well yes, the NLTK distribution is an option too. In the meantime I've written the SGML parser which is not so complicated. My current issue with Reuters is that it's a multilabel classification task. I'm not sure we have currently an online (minibatch) + multilabel classifier ? E/ 2013/5/

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Joel Nothman
Or you can use plain text from NLTK...? http://nltk.github.com/nltk_data/packages/corpora/reuters.zip On Thu, May 23, 2013 at 10:35 PM, Eustache DIEMERT wrote: > Right, there are indeed Reuters versions for download at the UCI [1]. > > I'm building a thorough example that can be self-contained i

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Eustache DIEMERT
Right, there are indeed Reuters versions for download at the UCI [1]. I'm building a thorough example that can be self-contained inc. download etc. It will be a bit long and I'm not sure it would be suitable for learning purposes (extracting data from SGML seems out of scope). Anyway I'll give i

Re: [Scikit-learn-general] greetings; more flexibility in trees

2013-05-23 Thread Gilles Louppe
Hi Ken, I share and understand your concerns about the rigidity of the current implementation. > I like using Extremely Randomized Trees, but I'm looking for more flexibility > in generating them. In particular, I'd like to be able to specify my own > criterion and split finding algorithm. I'm

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Lars Buitinck
2013/5/23 Eustache DIEMERT : > Lars, I love the idea, that would be much closer to the use case in the real > world. > > Could you suggest a relevant dataset ? I was thinking more along the lines of a utility script than a canned example, but I think some versions of the Reuters corpus are availab

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Eustache DIEMERT
Lars, I love the idea, that would be much closer to the use case in the real world. Could you suggest a relevant dataset ? Eustache 2013/5/23 Lars Buitinck > 2013/5/23 Gael Varoquaux : > > On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote: > >> 2) of interest > > > > If you are

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Lars Buitinck
2013/5/23 Gael Varoquaux : > On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote: >> 2) of interest > > If you are willing to put some effort to make it easy to follow for > people, and to add a plot at the end (for instance showing some test-set > error rate decreasing) I think that i

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-23 Thread Bao Thien
Hi Alexandre, Sorry for late reply. Just because the last two weeks there was a tutorial here, and I did not spend time for trying the new multi-cores. After this week, I will back to work and let you know soon. By the way, thank you for your sharing :) On Thu, May 23, 2013 at 10:15 AM, Alexand

Re: [Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Gael Varoquaux
On Thu, May 23, 2013 at 09:28:14AM +0200, Eustache DIEMERT wrote: > 2) of interest > I might refine it and send a PR to the examples section. If you are willing to put some effort to make it easy to follow for people, and to add a plot at the end (for instance showing some test-set error rate dec

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-23 Thread Alexandre ABRAHAM
Hi Bao, I haven't heard from you so I guess that it is working. FYI, I opened a PR for this feature here : https://github.com/scikit-learn/scikit-learn/pull/1976 Alexandre. On Fri, May 10, 2013 at 6:26 PM, Bao Thien wrote: > Hi Alexandre, > > It sounds very great. I will try it and let you kn

[Scikit-learn-general] Example for learning from a text stream with sklearn

2013-05-23 Thread Eustache DIEMERT
Hi list, I was looking for an example of out-of-core learning from a text stream with sklearn. I think it's a pretty common need nowadays and we should have a ready-made example of some sort. I found a sketch in Olivier's post on SO [1], but it doesn't work out of the box. So I worked a standalo