On Tue, Nov 20, 2018 at 09:58:49PM -0500, Andreas Mueller wrote:
> On 11/20/18 4:43 PM, Gael Varoquaux wrote:
> > We are planning to do heavy benchmarking of those strategies, to figure
> > out tradeoff. But we won't get to it before February, I am afraid.
> Does that mean you'd be opposed to addi
On 11/20/18 4:43 PM, Gael Varoquaux wrote:
We are planning to do heavy benchmarking of those strategies, to figure
out tradeoff. But we won't get to it before February, I am afraid.
Does that mean you'd be opposed to adding the leave-one-out TargetEncoder
before you do this? I would really li
+1 on the ideal in general (and to enforce this on new classes / params).
+1 to be conservative and not break existing code.
Le mar. 20 nov. 2018 à 21:09, Joris Van den Bossche <
jorisvandenboss...@gmail.com> a écrit :
> Op zo 18 nov. 2018 om 11:14 schreef Joel Nothman :
>
>> I think we're all ag
On Tue, Nov 20, 2018 at 04:35:43PM -0500, Andreas Mueller wrote:
> > - it can be done cross-validated, splitting the train data, in a
> >"cross-fit" strategy
> > (seehttps://github.com/dirty-cat/dirty_cat/issues/53)
> This is called leave-one-out in the category_encoding library, I think,
> an
On 11/20/18 4:16 PM, Gael Varoquaux wrote:
- the naive way is not the right one: just computing the average of y
for each category leads to overfitting quite fast
- it can be done cross-validated, splitting the train data, in a
"cross-fit" strategy (seehttps://github.com/dirty-cat/dirty
On Tue, Nov 20, 2018 at 04:06:30PM -0500, Andreas Mueller wrote:
> I would love to see the TargetEncoder ported to scikit-learn.
> The CountFeaturizer is pretty stalled:
> https://github.com/scikit-learn/scikit-learn/pull/9614
So would I. But there are several ways of doing it:
- the naive way is
I would love to see the TargetEncoder ported to scikit-learn.
The CountFeaturizer is pretty stalled:
https://github.com/scikit-learn/scikit-learn/pull/9614
:-/
Have you benchmarked the other encoders in the category_encoding lib?
I would be really curious to know when/how they help.
On 11/20/1
Hi scikit-learn friends,
As you might have seen on twitter, my lab -with a few friends- has
embarked on research to ease machine on "dirty data". We are
experimenting on new encoding methods for non-curated string categories.
For this, we are developing a small software project called "dirty_cat":
Op zo 18 nov. 2018 om 11:14 schreef Joel Nothman :
> I think we're all agreed that this change would be a good thing.
>
> What we're not agreed on is how much risk we take by breaking legacy code
> that relied on argument order.
>
I think that, in principle, it could be possible to do this with a
On Tue, Nov 20, 2018 at 08:15:07PM +0100, Olivier Grisel wrote:
> We can also do Paris in April / May or June if that's ok with Joel and better
> for Andreas.
Absolutely.
My thoughts here are that I want to minimize transportation, partly
because flying has a large carbon footprint. Also, for per
We can also do Paris in April / May or June if that's ok with Joel and
better for Andreas.
I am teaching on Fridays from end of January to March. But I can miss half
a day of sprint to teach my class.
--
Olivier
___
scikit-learn mailing list
scikit-lea
11 matches
Mail list logo