+1
Just wanted to point out that the K-1 subset proof is only true for binary
classification. Such heuristics do perform reasonably for the multiclass
classification criterion though.
On Monday, November 17, 2014, Alexander Hawk wrote:
> Perhaps you have become aware of this by now,
> but only
Perhaps you have become aware of this by now,
but only K-1 subset tests are needed to find the best
categorical split, not 2^(K-1)-1. This was a central
result proved in Brieman's book.
--
Download BIRT iHub F-Type -
>> I believe more in my results than in my expertise - and so should you :-)
>
> +1! There's very very few examples of theory trumping data in history... And
> a bajillion of the converse.
I guess I didn't express myself clearly: I didn't mean to say that I
mistrust my results per se.. I'm not tha
On Tue, Jun 4, 2013 at 8:16 PM, Peter Prettenhofer <
peter.prettenho...@gmail.com> wrote:
> I believe more in my results than in my expertise - and so should you :-)
> **
>
+1! There's very very few examples of theory trumping data in history...
And a bajillion of the converse.
I also think Joel
Hi Christian,
I believe more in my results than in my expertise - and so should you :-) **
I think you misunderstood me: I did not claim that one-hot encoded
categorical features give better results than ordinal encoded ones - I just
claimed that ordinal encoding works as good as one-hot encoded
On 06/04/2013 05:55 AM, Christian Jauvin wrote:
> Many thanks to all for your help and detailed answers, I really appreciate it.
>
> So I wanted to test the discussion's takeaway, namely, what Peter
> suggested: one-hot encode the categorical features with small
> cardinality, and leave the others
Many thanks to all for your help and detailed answers, I really appreciate it.
So I wanted to test the discussion's takeaway, namely, what Peter
suggested: one-hot encode the categorical features with small
cardinality, and leave the others in their ordinal form.
So from the same dataset I mentio
On 06/03/2013 09:15 AM, Peter Prettenhofer wrote:
> Our decision tree implementation only supports numerical splits; i.e.
> if tests val < threshold .
>
> Categorical features need to be encoded properly. I recommend one-hot
> encoding for features with small cardinality (e.g. < 50) and ordinal
Our decision tree implementation only supports numerical splits; i.e. if
tests val < threshold .
Categorical features need to be encoded properly. I recommend one-hot
encoding for features with small cardinality (e.g. < 50) and ordinal
encoding (simply assign each category an integer value) for fe
On 3 June 2013 08:43, Andreas Mueller wrote:
> On 06/03/2013 05:19 AM, Joel Nothman wrote:
>>
>> However, in these last two cases, the number of possible splits at a
>> single node is linear in the number of categories. Selecting an
>> arbitrary partition allows exponentially many splits with resp
On 06/03/2013 04:41 AM, Christian Jauvin wrote:
>> Sklearn does not implement any special treatment for categorical variables.
>> You can feed any float. The question is if it would work / what it does.
> I think I'm confused about a couple of aspects (that's what happens I
> guess when you play wi
On 06/03/2013 05:19 AM, Joel Nothman wrote:
>
> However, in these last two cases, the number of possible splits at a
> single node is linear in the number of categories. Selecting an
> arbitrary partition allows exponentially many splits with respect to
> the number of categories (though there m
On Mon, Jun 3, 2013 at 12:41 PM, Christian Jauvin wrote:
> > Sklearn does not implement any special treatment for categorical
> variables.
> > You can feed any float. The question is if it would work / what it does.
>
> I think I'm confused about a couple of aspects (that's what happens I
> guess
> Sklearn does not implement any special treatment for categorical variables.
> You can feed any float. The question is if it would work / what it does.
I think I'm confused about a couple of aspects (that's what happens I
guess when you play with algorithms for which you don't have a
complete and
On 06/02/2013 10:53 PM, Christian Jauvin wrote:
> Hi Andreas,
>
>> Btw, you do encode the categorical variables using one-hot, right?
>> The sklearn trees don't really support categorical variables.
> I'm rather perplexed by this.. I assumed that sklearn's RF only
> required its input to be numeric
Hi Andreas,
> Btw, you do encode the categorical variables using one-hot, right?
> The sklearn trees don't really support categorical variables.
I'm rather perplexed by this.. I assumed that sklearn's RF only
required its input to be numerical, so I only used a LabelEncoder up
to now.
My assumpt
I got very good results on text century dating using random forests on
very few (20-ish) bag-of-words tf-idf features selected by chi2. It
depends on the problem.
Cheers,
Vlad
On Sat, Jun 1, 2013 at 9:01 PM, Andreas Mueller
wrote:
> On 06/01/2013 08:30 PM, Christian Jauvin wrote:
>> Hi,
>>
>> I
On 06/01/2013 08:30 PM, Christian Jauvin wrote:
> Hi,
>
> I asked a (perhaps too vague?) question about the use of Random
> Forests with a mix of categorical and lexical features on two ML
> forums (stats.SE and MetaOp), but since it has received no attention,
> I figured that it might work better
Hi Christian,
Some time ago I had similar problems. I.e., I wanted to use additional
features to my lexical features and simple concatanation didn't work
that well for me even though both feature sets on their own performed
pretty well.
You can follow the discussion about my problem here [1] i
Hi,
I asked a (perhaps too vague?) question about the use of Random
Forests with a mix of categorical and lexical features on two ML
forums (stats.SE and MetaOp), but since it has received no attention,
I figured that it might work better on this list (I'm using sklearn's
RF of course):
"I'm work
20 matches
Mail list logo