Thanks for the clarification.
I have to create clusters vis-a-vis a dependent variable. I can't use
forests because I loose the structure. Rules I create from R score 10K
segments a second. About 1 billion a day.
The ideal algo will have the properties of a dtree. Variable selection,
robust a
2013/2/27 David Montgomery :
> Oknow I am really confused on how to interpret the tree.
>
> So...I am trying to build a Prob est tree. All of the independent variables
> are categorical and created dummies. What is throwing me off are the <=.
>
> I should have a rule that says e.g. if city=LA
2013/2/27 David Montgomery :
> Oknow I am really confused on how to interpret the tree.
>
> So...I am trying to build a Prob est tree. All of the independent variables
> are categorical and created dummies. What is throwing me off are the <=.
>
> I should have a rule that says e.g. if city=LA
Oknow I am really confused on how to interpret the tree.
So...I am trying to build a Prob est tree. All of the independent
variables are categorical and created dummies. What is throwing me off are
the <=.
I should have a rule that says e.g. if city=LA,NY and TIME=Noon then .20.
In the cha
Looks good to me - save the output to a file (e.g. foobar.dot) and run
the following command:
$ dot -Tpdf foobar.dot -o foobar.pdf
When I open the pdf all labels are correctly displayed - remember that
they are not indicator features - so the thresholds are usually
"country=AU <= 0.5".
You c
Thanks I used DictVectorizer()
I am now trying to add lables to the tree graph. Below are the labels and
the digraph Tree. However, I dont see lables on the tree nodes. Did I not
use feature names correct?
measurements = [
{'country':'US','city': 'Dubai'},
{'country':'US','city': 'London'}
I personally use:
labels_train = np.genfromtxt('dataset.txt', delimiter=',', usecols=0,
dtype=str)
data_train = np.genfromtxt('dataset.txt', delimiter=',')[:,1:]
(Y is labels_train, X is data_train)
2013/2/27 David Montgomery
> Hi,
>
> I have a data structure that looks like this:
>
> 1 NewYo
Hi David,
I recommend that you load the data using Pandas (``pandas.read_csv``).
Scikit-learn does not support categorical features out-of-the-box; you
need to encode them as dummy variables (aka one-hot encoding) - you
can do this either using ``sklearn.preprocessing.DictVectorizer`` or
via ``pan
Hi,
I have a data structure that looks like this:
1 NewYork 1 6 high
0 LA 3 4 low
...
I am trying to predict probability where Y is column one. The all of the
attributes of the X are categorical and I will use a dtree regression. How
do I load this data into the y and X?
Thanks
--