Re: [scikit-learn] transform categorical data to numerical representation

2017-08-05 Thread Joel Nothman
We are working on CategoricalEncoder in https://github.com/scikit-learn/scikit-learn/pull/9151 to help users more with this kind of thing. Feedback and testing is welcome. On 6 August 2017 at 02:13, Sebastian Raschka wrote: > Hi, Georg, > > I bring this up every time here on the mailing list :),

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-05 Thread Sebastian Raschka
Hi, Georg, I bring this up every time here on the mailing list :), and you probably aware of this issue, but it makes a difference whether your categorical data is nominal or ordinal. For instance if you have an ordinal variable like with values like {small, medium, large} you probably want to

[scikit-learn] transform categorical data to numerical representation

2017-08-05 Thread Georg Heiler
Hi, the LabelEncooder is only meant for a single column i.e. target variable. Is the DictVectorizeer or a manual chaining of multiple LabelEncoders (one per categorical column) the desired way to get values which can be fed into a subsequent classifier? Is there some way I have overlooked which w