Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread C W
Hermes, That's an interesting function. Does it work with sklearn after factorize? Is there any example? Thanks! On Thu, Apr 30, 2020 at 6:51 PM Hermes Morales wrote: > Perhaps pd.factorize could hello? > > Obtener Outlook para Android > > --

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread Hermes Morales
Perhaps pd.factorize could hello? Obtener Outlook para Android From: scikit-learn on behalf of Gael Varoquaux Sent: Thursday, April 30, 2020 5:12:06 PM To: Scikit-learn mailing list Subject: Re: [scikit-learn] Why does sklearn require on

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread Gael Varoquaux
On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote: > I've used R and Stata software, none needs such transformation. They have a > data type called "factors", which is different from "numeric". > My problem with OHE: > One-hot-encoding results in large number of features. This really blows up >

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread Michael Eickenberg
Hi, I think there are many reasons that have led to the current situation. One is that scikit-learn is based on numpy arrays, which do not offer categorical data types (yet: ideas are being discussed https://numpy.org/neps/nep-0041-improved-dtype-support.html Pandas already has a categorical data

[scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread C W
Hello everyone, I am frustrated with the one-hot-encoding requirement for categorical feature. Why? I've used R and Stata software, none needs such transformation. They have a data type called "factors", which is different from "numeric". My problem with OHE: One-hot-encoding results in large nu

[scikit-learn] StackingClassifier

2020-04-30 Thread Andrew Howe
Hi All Quick question about the stacking classifier . How do I know the order of the features that the final estimator uses? I've got an example which I've created like this (the LGRG and KSVM objects were