Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
To my understanding pandas.factorize only works for the static case where no unseen variables can occur. Georg Heiler schrieb am Mo. 7. Aug. 2017 um 08:40: > I will need to look into factorize. Here is the result from profiling the > transform method on a single new observation > https://coderevi

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
I will need to look into factorize. Here is the result from profiling the transform method on a single new observation https://codereview.stackexchange.com/q/171622/132999 Best Georg Sebastian Raschka schrieb am So. 6. Aug. 2017 um 20:39: > > performance of prediction is pretty lame when there

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Sebastian Raschka
> performance of prediction is pretty lame when there are around 100-150 > columns used as the input. you are talking about computational performance when you are calling the "transform" method? Have you done some profiling to find out where your bottle neck (in the for loop) is? Just one a ver

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
@sebastian: thanks. Indeed, I am aware of this problem. I developed something here: https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce but realized that the performance of prediction is pretty lame when there are around 100-150 columns used as the input. Do you have some ideas how to