Re: [Scikit-learn-general] GSoC 2013

2013-03-20 Thread Ronnie Ghose
Sorry, I did not mean to put it in such a way, I meant something else. But yes I understand your point. Thank you, Ronnie Ghose On Thu, Mar 21, 2013 at 1:30 AM, Olivier Grisel wrote: > 2013/3/20 Ronnie Ghose : > > Yes, however participation in a GSoC style program gives it credibility > it > >

Re: [Scikit-learn-general] GSoC 2013

2013-03-20 Thread Olivier Grisel
2013/3/20 Ronnie Ghose : > Yes, however participation in a GSoC style program gives it credibility it > would not otherwise have I believe... no? I would say that you should not engage in open source contribution to gain credibility but rather because you are interested in moving the project forwa

Re: [Scikit-learn-general] GSoC 2013

2013-03-20 Thread Ronnie Ghose
Yes, however participation in a GSoC style program gives it credibility it would not otherwise have I believe... no? Thank you, Shomiron Ghose On Thu, Mar 21, 2013 at 1:15 AM, Olivier Grisel wrote: > 2013/3/20 Ronnie Ghose : > > Mr. Grisel, > > > > Anyway I could sign up for this as well, but n

Re: [Scikit-learn-general] GSoC 2013

2013-03-20 Thread Olivier Grisel
2013/3/20 Ronnie Ghose : > Mr. Grisel, > > Anyway I could sign up for this as well, but not formally participate as a > GSoC student? If you cannot qualify formally for the GSoC program you are always free to contribute using the regular contribution workflow: http://scikit-learn.org/stable/devel

Re: [Scikit-learn-general] GSoC 2013

2013-03-20 Thread Ronnie Ghose
Mr. Grisel, Anyway I could sign up for this as well, but not formally participate as a GSoC student? .. :( I'm 2 months too .. young and they allow no exceptions, though I'm allowed to work in the US. Thank you, Shomiron Ghose On Thu, Mar 21, 2013 at 12:31 AM, Olivier Grisel wrote: > scikit-le

[Scikit-learn-general] GSoC 2013

2013-03-20 Thread Olivier Grisel
scikit-learn is officially registered for the 2013 edition of the Google Summer of Code under the PSF umbrella. http://wiki.python.org/moin/SummerOfCode/2013 Prospective students and mentors please feel free to register your name on the wiki: https://github.com/scikit-learn/scikit-learn/wiki/A-l

Re: [Scikit-learn-general] Invitation to Conservancy meetup today at PyCon USA 2013 (was Re: Membership application for the scikit-learn project: machine learning in Python)

2013-03-20 Thread Olivier Grisel
2013/3/16 Olivier Grisel : > Hi Bradley, > > Thanks for the heads up. I'll try to show up on time for the BoF. In > case I don't I just want to say that we are still interested in > joining the conservancy. Hi Bradley, During PyCon / PyData I had the opportunity to speak to various NumFOCUS board

Re: [Scikit-learn-general] domain of appicability - RandomForest, predict_proba function

2013-03-20 Thread Juan Nunez-Iglesias
Gilles, my understanding and experience is that RFs are fairly well calibrated, ie the probability estimates are pretty good, for "reasonable" number of trees, e.g. 255. Paul hasn't specified how accurate he expects his probability measurements. Another consideration, though, is that if a test sam

Re: [Scikit-learn-general] Problem with "Faces recognition example using eigenfaces and SVMs"

2013-03-20 Thread Patrick Flaherty
I'm not on the PIL mailing list but I'll sign up and post there. The one thing I was thinking scikit-learn could do is to give a better message when > imread(file_path) in sklearn/datasets/lfw.py returns a 0-dm array. Something to the effect of "the file wasn't successfully read" instead

Re: [Scikit-learn-general] domain of appicability - RandomForest, predict_proba function

2013-03-20 Thread Gilles Louppe
Note that you can get perfect scores (either 0.0 or 1.0) simply be setting n_estimators=1. This is why you should use this measure with caution. On 20 March 2013 15:27, Lars Buitinck wrote: > 2013/3/20 >> > I was just about to say that discarding predictions in a range .5 - >> > epsilon < p < .5

Re: [Scikit-learn-general] domain of appicability - RandomForest, predict_proba function

2013-03-20 Thread Lars Buitinck
2013/3/20 > > I was just about to say that discarding predictions in a range .5 - > > epsilon < p < .5 + epsilon can be a useful thing to do in some cases. > > Where does "epsilon" come from? I don't see it on the list of parameters It's a parameter that I just made up. If you consider >0.9 or <0

Re: [Scikit-learn-general] domain of appicability - RandomForest, predict_proba function

2013-03-20 Thread Paul . Czodrowski
> > You should use predict_proba with caution if what you want is a level > > of confidence with respect to the true values. If your trees are fully > > developed, then predict_proba is rather a level of agreement between > > the trees, no matter they are right or wrong with the true values. It >

Re: [Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Jaques Grobler
@ogrisel , I'd be happy to lend a hand with this. Just say yay and I'll get it going :) 2013/3/20 Andreas Mueller > Olivier wrote it and is planning to merge it. > I think he wouldn't resist if any one gave him a hand ;) > > I think it is quite urgent as this keeps getting up on the ml/irc/so >

Re: [Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Andreas Mueller
Olivier wrote it and is planning to merge it. I think he wouldn't resist if any one gave him a hand ;) I think it is quite urgent as this keeps getting up on the ml/irc/so Cheers, Andy -- Everyone hates slow websites. So

Re: [Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Lars Buitinck
2013/3/20 Jaques Grobler > Definitely - who made that tut originally? Would be a good addition to the > tutorial section. > We did, or rather, the other guys; I was lured into the project by this tutorial, which was quite good when it was up to data. > Else we could just update it and then lin

Re: [Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Jaques Grobler
Actually, I think the last update just missed the line at the bottom of the page about version 0.9.. There's a note by the index saying Note This document is meant to be used with *scikit-learn version 0.11+* (i.e. the current state of the master branch at the time of writing: 2012-02-13). So i

Re: [Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Jaques Grobler
Definitely - who made that tut originally? Would be a good addition to the tutorial section. Else we could just update it and then link to it from the tutorial section like with Jake's AstroML tutorial.. 2013/3/20 Lars Buitinck > 2013/3/20 Jaques Grobler > >> same thought as Lars.. also, that

Re: [Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Lars Buitinck
2013/3/20 Jaques Grobler > same thought as Lars.. also, that tutorial mentions at the bottom that > it's for scikit-learn 0.9 > We should update that (or actually, we should merge the tutorial into the mainline docs...) -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam -

Re: [Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Jaques Grobler
same thought as Lars.. also, that tutorial mentions at the bottom that it's for scikit-learn 0.9 2013/3/20 Lars Buitinck > 2013/3/19 Zvika Marx > >> /usr/lib/pymodules/python2.7/sklearn/linear_model/stochastic_gradient.pycin >> _fit_multiclass(self, X, y, sample_weight) 162 strategy is called

Re: [Scikit-learn-general] Test after installation fails

2013-03-20 Thread Pamela Carreño
Hi Andy, yes I'm working on 32bit. I thought that maybe it was me who did something wrong during the installation. Thanks for your help. Pamela On Wed, Mar 20, 2013 at 12:02 PM, Andreas Mueller wrote: > Hi Pamela. > Unfortunately this is a known problem. You are on 32bit, right? It is an > in

Re: [Scikit-learn-general] Test after installation fails

2013-03-20 Thread Andreas Mueller
Hi Pamela. Unfortunately this is a known problem. You are on 32bit, right? It is an instability in the blas in scipy. Don't worry about it to much. We are trying to make the test more robust. Hope that helps, Andy On 03/20/2013 11:59 AM, Pamela Carreño wrote: Hi Andy, I'm using 0.13.1 versi

Re: [Scikit-learn-general] Test after installation fails

2013-03-20 Thread Pamela Carreño
Hi Andy, I'm using 0.13.1 version. This is the output I get .../usr/local/lib/python2.7/dist-packages/scikit_learn-0.13.1-py2.7-linux-i686.egg/sklearn/manifold/spectral_embedding.py:225: UserWarning: Graph is not fully connected, spectra

Re: [Scikit-learn-general] Test after installation fails

2013-03-20 Thread Andreas Mueller
Hi Pamela. Could you please give us the full traceback / output of the test that is failing? Which version are you using? 0.13.1? Cheers, Andy On 03/20/2013 11:40 AM, Pamela Carreño wrote: Hi, I have installed scikit-learn following the "installing an official release" tutorial, however when

Re: [Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Lars Buitinck
2013/3/19 Zvika Marx > /usr/lib/pymodules/python2.7/sklearn/linear_model/stochastic_gradient.pycin > _fit_multiclass(self, X, y, sample_weight) 162 strategy is called OVA:One > Versus All > . 163 """ --> 164 X = np.asarray(X, dtype=np.float64, order='C') 165 166 # > Use joblib to run OVA in par

Re: [Scikit-learn-general] domain of appicability - RandomForest, predict_proba function

2013-03-20 Thread Gilles Louppe
No, I am saying that. I am just saying that what you compute is not really "a probability with respect to the true value". What this measure represents highly depends on the number of trees, the size of your data and the depth of the trees. Provided that you build an infinite number of trees, with

[Scikit-learn-general] Test after installation fails

2013-03-20 Thread Pamela Carreño
Hi, I have installed scikit-learn following the "installing an official release" tutorial, however when I try to test it using nosetests sklearn --exe I get Ran 1603 tests in 70.591s FAILED (SKIP=11, failures=1) Am I missing something? -- Pamela -

Re: [Scikit-learn-general] domain of appicability - RandomForest, predict_proba function

2013-03-20 Thread Lars Buitinck
2013/3/20 Gilles Louppe : > You should use predict_proba with caution if what you want is a level > of confidence with respect to the true values. If your trees are fully > developed, then predict_proba is rather a level of agreement between > the trees, no matter they are right or wrong with the t

Re: [Scikit-learn-general] domain of appicability - RandomForest, predict_proba function

2013-03-20 Thread Gilles Louppe
Hi Paul, You should use predict_proba with caution if what you want is a level of confidence with respect to the true values. If your trees are fully developed, then predict_proba is rather a level of agreement between the trees, no matter they are right or wrong with the true values. It is likely

Re: [Scikit-learn-general] Documentation consistency: Attribute formatting

2013-03-20 Thread Jaques Grobler
It would perhaps be good to add these, atleast point 1 and 3, as 2 is still a bit confusing, to the development docs. I can do this if you not on it already, Vlad 2013/3/19 Vlad Niculae > >> II. Sometimes if attribute descriptions have multiple lines, a backtick > >> is needed at the end of con

Re: [Scikit-learn-general] Problem with "Faces recognition example using eigenfaces and SVMs"

2013-03-20 Thread Jaques Grobler
Thanks for reporting.. This is perhaps more a matter for the PIL mailing list? http://www.pythonware.com/products/pil/ (See Free Support section) Unless, people know of any way we can help solve it on this side? 2013/3/19 Patrick Flaherty > > I'm experimenting with the examples/tutorials t

Re: [Scikit-learn-general] generation of a "random" confusion matrix

2013-03-20 Thread Paul . Czodrowski
I need the random matrix to evaluate the predictiveness of a particular model - this time, not in terms of a "domain of applicability" :) Just to clarify the question/put us on the same line: please see the attached PDF for the question I'm having in mind: Cheers & Thanks, Paul > > What do y

[Scikit-learn-general] Tutorial / text data / SGDClassifier fit problem

2013-03-20 Thread Zvika Marx
Hi I am trying to run through scikit-learn.github.com/scikit-learn-tutorial/working_with_text_data.html as is. Calls to SGDClassifier(...).fit(...), either within the pipe or directly, invoke "ValueError: setting an array element with a sequence." Thanks much (great tool!!!) ---

Re: [Scikit-learn-general] domain of appicability - RandomForest, predict_proba function

2013-03-20 Thread Paul . Czodrowski
The term "domain of applicability" is frequently used in the field of cheminformatics to judge the reliability of predictions for new (unseen) compounds. => the model should perform not very well for very dissimilar compounds In my simple understanding, the predict_proba function gives me at le