[scikit-learn] Fairness Metrics

2018-10-28 Thread Feldman, Joshua
Hi, I was wondering if there's any interest in adding fairness metrics to sklearn. Specifically, I was thinking of implementing the metrics described here: https://dsapp.uchicago.edu/projects/aequitas/ I recognize that these metrics are extremely simple to calculate, but given that sklearn is th

Re: [scikit-learn] Pipegraph example: KMeans + LDA

2018-10-28 Thread Andreas Mueller
On 10/24/18 4:11 AM, Manuel Castejón Limas wrote: Dear all, as a way of improving the documentation of PipeGraph we intend to provide more examples of its usage. It was a popular demand to show application cases to motivate its usage, so here it is a very simple case with two steps: a KMeans

Re: [scikit-learn] Strange code but that works

2018-10-28 Thread Joel Nothman
Be careful: that @property is very significant here. It means that this is a description of how to *get* the method, not how to *run* the method. You will notice, for instance, that it says `def transform(self)`, not `def transform(self, X)` ___ scikit-le

Re: [scikit-learn] How does the random state influence the decision tree splits?

2018-10-28 Thread Sebastian Raschka
That's nice to know, thanks a lot for the reference! Best, Sebastian > On Oct 28, 2018, at 3:34 AM, Guillaume Lemaître > wrote: > > FYI: https://github.com/scikit-learn/scikit-learn/pull/12364 > > On Sun, 28 Oct 2018 at 09:32, Guillaume Lemaître > wrote: > There is always a shuffling when i

Re: [scikit-learn] Strange code but that works

2018-10-28 Thread Guillaume Lemaître
On Sun, 28 Oct 2018 at 07:42, Louis Abraham via scikit-learn < scikit-learn@python.org> wrote: > Hi, > > This is a code from sklearn.pipeline.Pipeline: > @property > def transform(self): > """Apply transforms, and transform with the final estimator > > This also works where final estimator is ``No

Re: [scikit-learn] Question about get_params / set_params

2018-10-28 Thread Guillaume Lemaître
On Sun, 28 Oct 2018 at 09:31, Louis Abraham via scikit-learn < scikit-learn@python.org> wrote: > Hi, > > According to > http://scikit-learn.org/0.16/developers/index.html#get-params-and-set-params > , > get_params and set_params are used to clone estimators. > sklearn.base.clone is function used

Re: [scikit-learn] How does the random state influence the decision tree splits?

2018-10-28 Thread Guillaume Lemaître
FYI: https://github.com/scikit-learn/scikit-learn/pull/12364 On Sun, 28 Oct 2018 at 09:32, Guillaume Lemaître wrote: > There is always a shuffling when iteration over the features (even when > going to all features). > So in the case of a tie the split will be done on the first feature > encount

Re: [scikit-learn] How does the random state influence the decision tree splits?

2018-10-28 Thread Guillaume Lemaître
There is always a shuffling when iteration over the features (even when going to all features). So in the case of a tie the split will be done on the first feature encounter which will be different due to the shuffling. There is a PR which was intending to make the algorithm deterministic to alway

[scikit-learn] Question about get_params / set_params

2018-10-28 Thread Louis Abraham via scikit-learn
Hi, According to http://scikit-learn.org/0.16/developers/index.html#get-params-and-set-params , get_params and set_params are used to clone estimators. However, I don't understand how it is used in FeatureUnion: `retur

Re: [scikit-learn] How does the random state influence the decision tree splits?

2018-10-28 Thread Fernando Marcos Wittmann
The random_state is used in the splitters: SPLITTERS = SPARSE_SPLITTERS if issparse(X) else DENSE_SPLITTERS splitter = self.splitter if not isinstance(self.splitter, Splitter): splitter = SPLITTERS[self.splitter](criterion,

Re: [scikit-learn] How does the random state influence the decision tree splits?

2018-10-28 Thread Piotr Szymański
Just a small side note that I've come across with Random Forests which in the end form an ensemble of Decision Trees. I ran a thousand iterations of RFs on multi-label data and managed to get a 4-10 percentage points difference in subset accuracy, depending on the data set, just as a random effect,