Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread David Montgomery
Thanks for the clarification. I have to create clusters vis-a-vis a dependent variable. I can't use forests because I loose the structure. Rules I create from R score 10K segments a second. About 1 billion a day. The ideal algo will have the properties of a dtree. Variable selection, robust a

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread Peter Prettenhofer
2013/2/27 David Montgomery : > Oknow I am really confused on how to interpret the tree. > > So...I am trying to build a Prob est tree. All of the independent variables > are categorical and created dummies. What is throwing me off are the <=. > > I should have a rule that says e.g. if city=LA

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread Olivier Grisel
2013/2/27 David Montgomery : > Oknow I am really confused on how to interpret the tree. > > So...I am trying to build a Prob est tree. All of the independent variables > are categorical and created dummies. What is throwing me off are the <=. > > I should have a rule that says e.g. if city=LA

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread David Montgomery
Oknow I am really confused on how to interpret the tree. So...I am trying to build a Prob est tree. All of the independent variables are categorical and created dummies. What is throwing me off are the <=. I should have a rule that says e.g. if city=LA,NY and TIME=Noon then .20. In the cha

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread Peter Prettenhofer
Looks good to me - save the output to a file (e.g. foobar.dot) and run the following command: $ dot -Tpdf foobar.dot -o foobar.pdf When I open the pdf all labels are correctly displayed - remember that they are not indicator features - so the thresholds are usually "country=AU <= 0.5". You c

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread David Montgomery
Thanks I used DictVectorizer() I am now trying to add lables to the tree graph. Below are the labels and the digraph Tree. However, I dont see lables on the tree nodes. Did I not use feature names correct? measurements = [ {'country':'US','city': 'Dubai'}, {'country':'US','city': 'London'}

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread ShNaYkHs ShNaYkHs
I personally use: labels_train = np.genfromtxt('dataset.txt', delimiter=',', usecols=0, dtype=str) data_train = np.genfromtxt('dataset.txt', delimiter=',')[:,1:] (Y is labels_train, X is data_train) 2013/2/27 David Montgomery > Hi, > > I have a data structure that looks like this: > > 1 NewYo

Re: [Scikit-learn-general] How to load data into scikits

2013-02-27 Thread Peter Prettenhofer
Hi David, I recommend that you load the data using Pandas (``pandas.read_csv``). Scikit-learn does not support categorical features out-of-the-box; you need to encode them as dummy variables (aka one-hot encoding) - you can do this either using ``sklearn.preprocessing.DictVectorizer`` or via ``pan

[Scikit-learn-general] How to load data into scikits

2013-02-27 Thread David Montgomery
Hi, I have a data structure that looks like this: 1 NewYork 1 6 high 0 LA 3 4 low ... I am trying to predict probability where Y is column one. The all of the attributes of the X are categorical and I will use a dtree regression. How do I load this data into the y and X? Thanks --