Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Jacob Vanderplas
Will do, thanks Gael. Enjoy your vacation! Jake On Wed, Feb 27, 2013 at 12:12 PM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > On Wed, Feb 27, 2013 at 11:34:43AM -0800, Jacob Vanderplas wrote: > > Well, since communication time is limited, I'd be happy to work on a > proposal > >

Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Gael Varoquaux
On Wed, Feb 27, 2013 at 11:34:43AM -0800, Jacob Vanderplas wrote: > Well, since communication time is limited, I'd be happy to work on a proposal > on my own and put your name on it as well, if you trust me to do that without > you having a chance to read it.  Or will you be back before the March 3

Re: [Scikit-learn-general] download htmldoc for 0.12 or later

2013-02-27 Thread Vlad Niculae
This does require sphinx though, do you think we should make a downloadable copy available at release time? On Wed, Feb 27, 2013 at 5:44 PM, Andreas Mueller wrote: > On 02/27/2013 03:47 PM, Lars Buitinck wrote: >> 2013/2/27 Dustin Arendt : >>> I work at a lab where our research machines are compl

Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Jacob Vanderplas
The Patagonia trip, yes? Well, since communication time is limited, I'd be happy to work on a proposal on my own and put your name on it as well, if you trust me to do that without you having a chance to read it. Or will you be back before the March 30th deadline? Jake On Wed, Feb 27, 2013 at

Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Gael Varoquaux
On Wed, Feb 27, 2013 at 10:22:02AM -0800, Jacob Vanderplas wrote: > Let's wait a bit to hear if others are interested, and then I'll start an > off-list email chain to discuss ideas. I'll be probably be gone on vacations by then: I am leaving in less than 24 hours (and completly crushed with thing

Re: [Scikit-learn-general] Imbalance in scikit-learn

2013-02-27 Thread Manish Amde
Using the sample_weight parameter in the RandomForestClassifier along with the balance_weights method from the preprocessing module to generate the sample weights might work as well. You can check this link for a previous related discussion. http://sourceforge.net/mailarchive/message.php?msg_id

Re: [Scikit-learn-general] nosetests sklearn failed

2013-02-27 Thread ShNaYkHs ShNaYkHs
No, I installed numpy from the official website (with scipy). 2013/2/27 Andreas Mueller > On 02/27/2013 12:08 PM, Vlad Niculae wrote: > > The second run ShNaYkHs posted looks like a good install though, > > despite the test failure. > Are these your binaries on that website? > It also has 64bit

Re: [Scikit-learn-general] nosetests sklearn failed

2013-02-27 Thread Vlad Niculae
No, my binaries are only on sourceforge and pypi. Vlad On Wed, Feb 27, 2013 at 5:46 PM, Andreas Mueller wrote: > On 02/27/2013 12:08 PM, Vlad Niculae wrote: >> The second run ShNaYkHs posted looks like a good install though, >> despite the test failure. > Are these your binaries on that website?

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread David Montgomery
Thanks for the clarification. I have to create clusters vis-a-vis a dependent variable. I can't use forests because I loose the structure. Rules I create from R score 10K segments a second. About 1 billion a day. The ideal algo will have the properties of a dtree. Variable selection, robust a

Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Jacob Vanderplas
Gael, That would be great! Let's wait a bit to hear if others are interested, and then I'll start an off-list email chain to discuss ideas. Jake On Wed, Feb 27, 2013 at 10:17 AM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > On Wed, Feb 27, 2013 at 10:04:25AM -0800, Jacob Vanderpla

Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Gael Varoquaux
On Wed, Feb 27, 2013 at 10:04:25AM -0800, Jacob Vanderplas wrote: > Is anyone planning to submit a scikit-learn tutorial proposal?  I'm planning > to > attend the conference; I'd be happy to prepare another tutorial myself, or to > team-teach with someone else who is interested. I was thinking th

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread Peter Prettenhofer
2013/2/27 David Montgomery : > Oknow I am really confused on how to interpret the tree. > > So...I am trying to build a Prob est tree. All of the independent variables > are categorical and created dummies. What is throwing me off are the <=. > > I should have a rule that says e.g. if city=LA

[Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Jacob Vanderplas
Hi folks, The call for tutorial & talk proposals for Scipy 2013 is open, and tutorial proposals are due by the end of March. The themes for Scipy 2013 include Machine Learning -- see the info here: http://conference.scipy.org/scipy2013/tutorial_overview.php I've talked to Francesc, who is the tutor

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread Olivier Grisel
2013/2/27 David Montgomery : > Oknow I am really confused on how to interpret the tree. > > So...I am trying to build a Prob est tree. All of the independent variables > are categorical and created dummies. What is throwing me off are the <=. > > I should have a rule that says e.g. if city=LA

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread David Montgomery
Oknow I am really confused on how to interpret the tree. So...I am trying to build a Prob est tree. All of the independent variables are categorical and created dummies. What is throwing me off are the <=. I should have a rule that says e.g. if city=LA,NY and TIME=Noon then .20. In the cha

Re: [Scikit-learn-general] nosetests sklearn failed

2013-02-27 Thread Andreas Mueller
On 02/27/2013 12:08 PM, Vlad Niculae wrote: > The second run ShNaYkHs posted looks like a good install though, > despite the test failure. Are these your binaries on that website? It also has 64bit versions. They say they require the mkl numpy install. Shnaykhs: did you install the mkl numpy from t

Re: [Scikit-learn-general] download htmldoc for 0.12 or later

2013-02-27 Thread Andreas Mueller
On 02/27/2013 03:47 PM, Lars Buitinck wrote: > 2013/2/27 Dustin Arendt : >> I work at a lab where our research machines are completely isolated from the >> internet. I was hoping to be able to download a complete version of the >> scikit-learn htmldoc to host on our internal webserver. However, t

Re: [Scikit-learn-general] Obtaining a confidence or posterior probability using a classifier from sklearn

2013-02-27 Thread Andreas Mueller
On 02/27/2013 04:48 PM, ShNaYkHs ShNaYkHs wrote: > Is it possible to get a confidence value or a probability (P(y|x)) > that the class y predicted for a given data-point x is correct ? Using > any of these classifiers from sklearn: tree, GaussianNB (naive > bayes), KNeighborsClassifier, svm (svm

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread Peter Prettenhofer
Looks good to me - save the output to a file (e.g. foobar.dot) and run the following command: $ dot -Tpdf foobar.dot -o foobar.pdf When I open the pdf all labels are correctly displayed - remember that they are not indicator features - so the thresholds are usually "country=AU <= 0.5". You c

[Scikit-learn-general] Obtaining a confidence or posterior probability using a classifier from sklearn

2013-02-27 Thread ShNaYkHs ShNaYkHs
Is it possible to get a confidence value or a probability (P(y|x)) that the class y predicted for a given data-point x is correct ? Using any of these classifiers from sklearn: tree, GaussianNB (naive bayes), KNeighborsClassifier, svm (svm.SVC ..). --

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread David Montgomery
Thanks I used DictVectorizer() I am now trying to add lables to the tree graph. Below are the labels and the digraph Tree. However, I dont see lables on the tree nodes. Did I not use feature names correct? measurements = [ {'country':'US','city': 'Dubai'}, {'country':'US','city': 'London'}

Re: [Scikit-learn-general] Why lables should be integer for the random forest classifier ?

2013-02-27 Thread Lars Buitinck
2013/2/27 ShNaYkHs ShNaYkHs : > For the RandomForestClassifier, the target values for training should be > integers (that correspond to classes in classification). When I specify the > labels as strings, I get an exception "ValueError: invalid literal for > float(): aaa". For the other clissifiers

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread ShNaYkHs ShNaYkHs
I personally use: labels_train = np.genfromtxt('dataset.txt', delimiter=',', usecols=0, dtype=str) data_train = np.genfromtxt('dataset.txt', delimiter=',')[:,1:] (Y is labels_train, X is data_train) 2013/2/27 David Montgomery > Hi, > > I have a data structure that looks like this: > > 1 NewYo

[Scikit-learn-general] Why lables should be integer for the random forest classifier ?

2013-02-27 Thread ShNaYkHs ShNaYkHs
For the RandomForestClassifier, the target values for training should be integers (that correspond to classes in classification). When I specify the labels as strings, I get an exception "ValueError: invalid literal for float(): aaa". For the other clissifiers (svm, tree, knn, neiveBayes etc) I can

Re: [Scikit-learn-general] download htmldoc for 0.12 or later

2013-02-27 Thread Lars Buitinck
2013/2/27 Dustin Arendt : > I work at a lab where our research machines are completely isolated from the > internet. I was hoping to be able to download a complete version of the > scikit-learn htmldoc to host on our internal webserver. However, the only > htmldoc on sourceforge is for the 0.7 ve

[Scikit-learn-general] download htmldoc for 0.12 or later

2013-02-27 Thread Dustin Arendt
Hi, I work at a lab where our research machines are completely isolated from the internet. I was hoping to be able to download a complete version of the scikit-learn htmldoc to host on our internal webserver. However, the only htmldoc on sourceforge is for the 0.7 version (though the PDF version

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread Peter Prettenhofer
Hi David, I recommend that you load the data using Pandas (``pandas.read_csv``). Scikit-learn does not support categorical features out-of-the-box; you need to encode them as dummy variables (aka one-hot encoding) - you can do this either using ``sklearn.preprocessing.DictVectorizer`` or via ``pan

[Scikit-learn-general] How to load data into scikits

2013-02-27 Thread David Montgomery
Hi, I have a data structure that looks like this: 1 NewYork 1 6 high 0 LA 3 4 low ... I am trying to predict probability where Y is column one. The all of the attributes of the X are categorical and I will use a dtree regression. How do I load this data into the y and X? Thanks --

Re: [Scikit-learn-general] nosetests sklearn failed

2013-02-27 Thread Vlad Niculae
The second run ShNaYkHs posted looks like a good install though, despite the test failure. On Wed, Feb 27, 2013 at 11:06 AM, Vlad Niculae wrote: > I built the binaries, is this because of the version of numpy I > compiled against? > > On Tue, Feb 26, 2013 at 4:51 PM, ShNaYkHs ShNaYkHs wrote: >>

Re: [Scikit-learn-general] nosetests sklearn failed

2013-02-27 Thread Vlad Niculae
I built the binaries, is this because of the version of numpy I compiled against? On Tue, Feb 26, 2013 at 4:51 PM, ShNaYkHs ShNaYkHs wrote: > Now I re-installed numpy, scipy matplotlib and scikit-learn from > http://www.lfd.uci.edu/~gohlke/pythonlibs/#scikit-learn , I choose the > versions ending

Re: [Scikit-learn-general] Should sklearn.pipeline.Pipeline expose "classes_" property if the final estimator is a classifier?

2013-02-27 Thread Tadej Janež
On Tue, 2013-02-26 at 15:21 +0100, Lars Buitinck wrote: > > I'm all in favor of that, but we have so many different estimators > that special-casing Pipeline for all (kinds of) them is infeasible. So > we should come up with an elegant and general set of rules, which we > can then implement by e.

Re: [Scikit-learn-general] How to get all rules in a tree by leaf node path

2013-02-27 Thread Gilles Louppe
Hi David, I think you should have a look at sklearn.tree.export_graphviz. It will generate a picture of the tree for you. - Reference: http://scikit-learn.org/dev/modules/generated/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz - Example: http://scikit-learn.org/dev/_images/iris.s