subject:"\[Scikit\-learn\-general\] Random Forest with a mix of categorical and lexical features"

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2014-11-17 Thread Manish Amde

+1 Just wanted to point out that the K-1 subset proof is only true for binary classification. Such heuristics do perform reasonably for the multiclass classification criterion though. On Monday, November 17, 2014, Alexander Hawk wrote: > Perhaps you have become aware of this by now, > but only

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2014-11-17 Thread Alexander Hawk

Perhaps you have become aware of this by now, but only K-1 subset tests are needed to find the best categorical split, not 2^(K-1)-1. This was a central result proved in Brieman's book. -- Download BIRT iHub F-Type -

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-06 Thread Christian Jauvin

>> I believe more in my results than in my expertise - and so should you :-) > > +1! There's very very few examples of theory trumping data in history... And > a bajillion of the converse. I guess I didn't express myself clearly: I didn't mean to say that I mistrust my results per se.. I'm not tha

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-06 Thread Juan Nunez-Iglesias

On Tue, Jun 4, 2013 at 8:16 PM, Peter Prettenhofer < peter.prettenho...@gmail.com> wrote: > I believe more in my results than in my expertise - and so should you :-) > ** > +1! There's very very few examples of theory trumping data in history... And a bajillion of the converse. I also think Joel

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-04 Thread Peter Prettenhofer

Hi Christian, I believe more in my results than in my expertise - and so should you :-) ** I think you misunderstood me: I did not claim that one-hot encoded categorical features give better results than ordinal encoded ones - I just claimed that ordinal encoding works as good as one-hot encoded

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-04 Thread Andreas Mueller

On 06/04/2013 05:55 AM, Christian Jauvin wrote: > Many thanks to all for your help and detailed answers, I really appreciate it. > > So I wanted to test the discussion's takeaway, namely, what Peter > suggested: one-hot encode the categorical features with small > cardinality, and leave the others

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-03 Thread Christian Jauvin

Many thanks to all for your help and detailed answers, I really appreciate it. So I wanted to test the discussion's takeaway, namely, what Peter suggested: one-hot encode the categorical features with small cardinality, and leave the others in their ordinal form. So from the same dataset I mentio

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-03 Thread Andreas Mueller

On 06/03/2013 09:15 AM, Peter Prettenhofer wrote: > Our decision tree implementation only supports numerical splits; i.e. > if tests val < threshold . > > Categorical features need to be encoded properly. I recommend one-hot > encoding for features with small cardinality (e.g. < 50) and ordinal

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-03 Thread Peter Prettenhofer

Our decision tree implementation only supports numerical splits; i.e. if tests val < threshold . Categorical features need to be encoded properly. I recommend one-hot encoding for features with small cardinality (e.g. < 50) and ordinal encoding (simply assign each category an integer value) for fe

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-03 Thread Gilles Louppe

On 3 June 2013 08:43, Andreas Mueller wrote: > On 06/03/2013 05:19 AM, Joel Nothman wrote: >> >> However, in these last two cases, the number of possible splits at a >> single node is linear in the number of categories. Selecting an >> arbitrary partition allows exponentially many splits with resp

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-02 Thread Andreas Mueller

On 06/03/2013 04:41 AM, Christian Jauvin wrote: >> Sklearn does not implement any special treatment for categorical variables. >> You can feed any float. The question is if it would work / what it does. > I think I'm confused about a couple of aspects (that's what happens I > guess when you play wi

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-02 Thread Andreas Mueller

On 06/03/2013 05:19 AM, Joel Nothman wrote: > > However, in these last two cases, the number of possible splits at a > single node is linear in the number of categories. Selecting an > arbitrary partition allows exponentially many splits with respect to > the number of categories (though there m

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-02 Thread Joel Nothman

On Mon, Jun 3, 2013 at 12:41 PM, Christian Jauvin wrote: > > Sklearn does not implement any special treatment for categorical > variables. > > You can feed any float. The question is if it would work / what it does. > > I think I'm confused about a couple of aspects (that's what happens I > guess

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-02 Thread Christian Jauvin

> Sklearn does not implement any special treatment for categorical variables. > You can feed any float. The question is if it would work / what it does. I think I'm confused about a couple of aspects (that's what happens I guess when you play with algorithms for which you don't have a complete and

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-02 Thread Andreas Mueller

On 06/02/2013 10:53 PM, Christian Jauvin wrote: > Hi Andreas, > >> Btw, you do encode the categorical variables using one-hot, right? >> The sklearn trees don't really support categorical variables. > I'm rather perplexed by this.. I assumed that sklearn's RF only > required its input to be numeric

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-02 Thread Christian Jauvin

Hi Andreas, > Btw, you do encode the categorical variables using one-hot, right? > The sklearn trees don't really support categorical variables. I'm rather perplexed by this.. I assumed that sklearn's RF only required its input to be numerical, so I only used a LabelEncoder up to now. My assumpt

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-02 Thread Vlad Niculae

I got very good results on text century dating using random forests on very few (20-ish) bag-of-words tf-idf features selected by chi2. It depends on the problem. Cheers, Vlad On Sat, Jun 1, 2013 at 9:01 PM, Andreas Mueller wrote: > On 06/01/2013 08:30 PM, Christian Jauvin wrote: >> Hi, >> >> I

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-01 Thread Andreas Mueller

On 06/01/2013 08:30 PM, Christian Jauvin wrote: > Hi, > > I asked a (perhaps too vague?) question about the use of Random > Forests with a mix of categorical and lexical features on two ML > forums (stats.SE and MetaOp), but since it has received no attention, > I figured that it might work better

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-01 Thread Philipp Singer

Hi Christian, Some time ago I had similar problems. I.e., I wanted to use additional features to my lexical features and simple concatanation didn't work that well for me even though both feature sets on their own performed pretty well. You can follow the discussion about my problem here [1] i

[Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-01 Thread Christian Jauvin

Hi, I asked a (perhaps too vague?) question about the use of Random Forests with a mix of categorical and lexical features on two ML forums (stats.SE and MetaOp), but since it has received no attention, I figured that it might work better on this list (I'm using sklearn's RF of course): "I'm work

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

[Scikit-learn-general] Random Forest with a mix of categorical and lexical features

20 matches

Site Navigation

Mail list logo

Footer information