[scikit-learn] Random Binning Features

2020-05-01 Thread sai_ng
Hey folks ! Hope you're all doing well. I'm developing Random Fourier Feature implementation in c++ for a repository. Scikits implementation on RBFSampler has been really helpful, and I must say that I'm charmed but how compact, yet powerful each line of code is. I'm writing this mail because I c

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-01 Thread C W
Thank you for the link, Guilaumme. In my particular case, I am working on random forest classification. The notebook seems great. I will have to go through it in detail. I'm still fairly new at using sklearn. Thank you for everyone's quick response, always feeling loved on here! :) On Fri, May

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-01 Thread Guillaume LemaƮtre
OrdinalEncoder is the equivalent of pd.factorize and will work in the scikit-learn ecosystem. However, be aware that you should not just swap OneHotEncoder to OrdinalEncoder just at your wish. It depends of your machine learning pipeline. As mentioned by Gael, tree-based algorithm will be fine wi