Re: [scikit-learn] ANN Dirty_cat: learning on dirty categories

2018-11-21 Thread Andreas Mueller
On 11/21/18 10:34 AM, Gael Varoquaux wrote: Joris has just accepted to help with benchmarking. We can have preliminary results earlier. The question really is: out of the different variants that exist, which one should we choose. I think that it is a legitimate question that arises on many of

Re: [scikit-learn] ANN Dirty_cat: learning on dirty categories

2018-11-21 Thread Gael Varoquaux
On Wed, Nov 21, 2018 at 09:47:13AM -0500, Andreas Mueller wrote: > The PR is over a year old already, and you hadn't voiced any opposition > there. My bad, sorry. Given the name, I had not guessed the link between the PR and encoding of categorical features. I find myself very much in agreement wi

Re: [scikit-learn] ANN Dirty_cat: learning on dirty categories

2018-11-21 Thread Andreas Mueller
On 11/21/18 12:38 AM, Gael Varoquaux wrote: On Tue, Nov 20, 2018 at 09:58:49PM -0500, Andreas Mueller wrote: On 11/20/18 4:43 PM, Gael Varoquaux wrote: We are planning to do heavy benchmarking of those strategies, to figure out tradeoff. But we won't get to it before February, I am afraid.