Re: [scikit-learn] random forests and multil-class probability

2021-07-27 Thread Nicolas Hug
To add to Guillaume's answer: the native multiclass support for forests/trees is described here: https://scikit-learn.org/stable/modules/tree.html#multi-output-problems It's not a one-vs-rest strategy and can be summed up as: * Store n output values in leaves, instead of 1;

Re: [scikit-learn] Spanish translation proposal for Scikit-Learn documentation

2021-02-09 Thread Nicolas Hug
Hi María Ángela, Thank you for your interest in contributing to scikit-learn! Could you detail a bit more what kind of involvement you would need from the scikit-learn maintainers / team? So far, we've been welcoming third-party translations and they have a dedicated section on our website where yo

Re: [scikit-learn] Finding the PC that captures a specific variable

2021-01-22 Thread Nicolas Hug
Hi Mahmood, There are different pieces of info that you can get from PCA: 1. How important is a given PC to reconstruct the entire dataset -> This is given by explained_variance_ratio_ as Guillaume suggested 2. What is the contribution of each feature to each PC (remember that a PC is a line

Re: [scikit-learn] extraction of grid search values

2021-01-05 Thread Nicolas Hug
Glenn, You need to fit the estimator with some data for the cv_results_ attribute to exist. You may refer to https://scikit-learn.org/stable/getting_started.html Nicolas On Tue, 5 Jan 2021 at 17:25, Glenn Schultz via scikit-learn < scikit-learn@python.org> wrote: > All, > > I have a grid search

Re: [scikit-learn] Interpreting results of random forest classifier

2020-12-28 Thread Nicolas Hug
Hi David, > As I understand it now the 0 probability is probability that the prediction is wrong, and the 1 probability is the probability that the prediction is correct No: in binary classification, the `predict_proba` method returns a single number in [0, 1] indicating the probability that

Re: [scikit-learn] sample_weight vs class_weight

2020-12-04 Thread Nicolas Hug
Basically passing class weights should be equivalent to passing per-class-constant sample weights. > why do some estimators allow to pass weights both as a dict in the init or as sample weights in fit? what's the logic? SW is a per-sample property (aligned with X and y) so we avoid passing t

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Nicolas Hug
load_iris() reads a csv file, and then retrieves/sets some other info like the feature names and a description of the dataset (which comes from another file) Then it packs everything into a Bunch object which is basically a fancy dict: https://github.com/scikit-learn/scikit-learn/blob/master/

Re: [scikit-learn] Creating dataset

2020-11-08 Thread Nicolas Hug
Mahmood, From what I understand your dataset is stored in a csv file. I'd recommend just reading that csv file with e.g. pandas (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), and then just use the dataframe as input to scikit-learn utilities (you may need t

Re: [scikit-learn] implementing regularized random forest

2020-11-03 Thread Nicolas Hug
Mickael, You probably don't need to ship an entire fork, but all the tree internals that you are using (splitter etc.) are part of a private API so yes, you would need to duplicate these into your own implementation. Nicolas On 11/3/20 4:38 PM, Mick Men wrote: Hello, I am trying to impleme

[scikit-learn] ANN: Welcoming Christian Lorentzen and Juan Carlos Alfaro Jiménez

2020-08-17 Thread Nicolas Hug
The core developers of Scikit-learn have recently voted to welcome Christian Lorentzen to the core dev team, and Juan Carlos Alfaro Jiménez to the triage team, in recognition of their efforts and trustworthiness as contributors. Congratulations to you both and thank you for your contributions!

Re: [scikit-learn] Opinion on reference mentioning that RF uses weak learners

2020-08-17 Thread Nicolas Hug
I'm not sure honestly, but I think you'll find more details in Schapire's paper (http://rob.schapire.net/papers/strengthofweak.pdf) and its refs. In particular page 5 (201) On 8/16/20 8:37 PM, Brown J.B. via scikit-learn wrote: > As previously mentioned, a "weak learner" is just a learner that

Re: [scikit-learn] Opinion on reference mentioning that RF uses weak learners

2020-08-16 Thread Nicolas Hug
As previously mentioned, a "weak learner" is just a learner that barely performs better than random. It's more common in the context of boosting, but I think weak learning predates boosting, and the original RF paper by Breiman does make reference to "weak learners": It's interesting that Fore

Re: [scikit-learn] custom estimator with more than two arguments to fit()

2020-07-31 Thread Nicolas Hug
Hi Matt, We do have CCA and other PLS-related transformers / regressors in scikit-learn. They are able to do dimensionality reduction on both X and Y (which I believe correspond to spp and env), so you might want to have a look at these. However, they're not fully compatible with the whole ec

Re: [scikit-learn] Permission to publish

2020-06-19 Thread Nicolas Hug
Hi Gaspar, The package and the docs are BSD licensed so you're free to use the content in a publication. If you use scikit-learn, please make sure to cite the package https://scikit-learn.org/stable/about.html#citing-scikit-learn Nicolas On 6/19/20 11:39 AM, DELSO, GASPAR (ICCV) wrote: Hi,

Re: [scikit-learn] sklearn Pipeline: argument of type 'ColumnTransformer' is not iterable

2020-05-29 Thread Nicolas Hug
Also, you should not scale your input before computing cross-validation scores. By doing that you are biasing your results because each test set knows something about the rest of the data (even if it's not target data) The scaling should be applied independently on each (train / test) pair. Th

Re: [scikit-learn] scikit-learn monthly meeting May 25th

2020-05-22 Thread Nicolas Hug
These were last month notes ;)  (the text of the link was correct, but the href wasn't) The new pad is at https://hackmd.io/4VeWX5H9Tlmz132WAD-Q0w On 5/22/20 5:10 AM, Chiara Marmo wrote: Hi all, The next scikit-learn monthly meeting will take place on Monday May 25th at 12PM UTC: https://

Re: [scikit-learn] Notes core-dev meeting May 25th

2020-05-19 Thread Nicolas Hug
https://hackmd.io/4VeWX5H9Tlmz132WAD-Q0w On 5/19/20 3:39 PM, Adrin wrote: Thanks Chiara, I think I'm missing the link to the agenda. Where should I find it? Thanks, Adrin On Tue, May 19, 2020 at 7:51 PM Chiara Marmo > wrote: Dear core-devs, I've taken

Re: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn technical committee

2020-04-27 Thread Nicolas Hug
+1 On 4/27/20 9:16 AM, Gael Varoquaux wrote: +1 And thank you very much Adrin! On Mon, Apr 27, 2020 at 09:12:02AM -0400, Andreas Mueller wrote: Hi All. Given all his recent contributions, I want to nominate Adrin Jalali to the Technical Committee: https://scikit-learn.org/stable/governance.ht

[scikit-learn] Monthly meetings

2020-03-26 Thread Nicolas Hug
Hi all, The next scikit-learn monthly meeting will take place on Monday (https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=3&day=30&hour=11&min=0&sec=0&p1=240&p2=33&p3=37&p4=179&p5=195

Re: [scikit-learn] tensorflow and scikit-learn

2020-03-03 Thread Nicolas Hug
Hi Nils, From a quick glance it looks like you're building a fully connected multi-layer perceptron so yes, this is possible in scikit-learn with the neural_network module (check out the docs). The script would be quite different though, it's not just plug and play. Also, for anything more co

[scikit-learn] Monthly meetings

2020-02-20 Thread Nicolas Hug
Hi all, The next scikit-learn monthly meeting will take place on Monday at the usual time (https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=2&day=24&hour=12&min=0&sec=0&p1=240&p2=33&p3=37&p4=179&p5=195_

Re: [scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)

2020-02-15 Thread Nicolas Hug
inally fully quench my appetite after nearly two years. I will have to retrace my steps and get back to the good old Python ways (again). Thank you. Highest regards, Paul On Friday, February 14, 2020, 07:00:35 a.m. CST, Nicolas Hug wrote: Hi Paul, The way multioutput is handled in

Re: [scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)

2020-02-14 Thread Nicolas Hug
Hi Paul, The way multioutput is handled in decision trees (and thus in the forests) is described in https://scikit-learn.org/stable/modules/tree.html#multi-output-problems. As you can see, the correlation between the output values *is* taken into account. Can you explain what you would like

Re: [scikit-learn] Issues for Berlin and Paris Sprints

2020-01-15 Thread Nicolas Hug
Hi Chiara, Thanks for taking care of this have a list of two/three reviewers available to check on a specific issue That might not be tractable in practice because we have a bunch of "bulk" issues involving many PRs, e.g. the issues about updating the random_state docs everywhere. But assigni

Re: [scikit-learn] Decision tree call chronology

2020-01-14 Thread Nicolas Hug
Hi Aditya, It's hard for us to answer without any specific question. Perhaps this will help: https://scikit-learn.org/stable/developers/contributing.html#reading-the-existing-code-base The tree code is quite complex, because it is very generic and can support many different settings (multiou

Re: [scikit-learn] Time for Roadmap for the coming years?

2020-01-07 Thread Nicolas Hug
The roadmap was updated not so long ago (https://github.com/scikit-learn/scikit-learn/pull/15332) On a related note, we recently discussed defining a roadmap for an eventual 1.0 release https://github.com/scikit-learn/scikit-learn/issues/14386 On 1/7/20 5:25 AM, Siddharth Gupta wrote: The l

Re: [scikit-learn] Vote on SLEP010: n_features_in_ attribute

2019-12-03 Thread Nicolas Hug
+1 On 12/3/19 5:40 PM, Adrin wrote: +1 On Tue., Dec. 3, 2019, 23:28 Andreas Mueller, <mailto:t3k...@gmail.com>> wrote: +1 On 12/3/19 5:09 PM, Nicolas Hug wrote: As per our Governance <http://scikit-learn.org/stable/governance.html> document, changes to

[scikit-learn] Vote on SLEP010: n_features_in_ attribute

2019-12-03 Thread Nicolas Hug
As per our Governance document, changes to API principles are to be established through an Enhancement Proposal (SLEP) from which any core developer can call for a vote on its acceptance. * * *SLEP010: n_features_in attribute *is up for a vote.

Re: [scikit-learn] scikit-learn twitter account

2019-12-02 Thread Nicolas Hug
ather than permission": the consequences of getting something wrong are lighter than when incorporating code in the library. Hopefully, this should enables us to keep the twitter account active while minimizing the amount of time spent on it. My 2 cents, Gaël On Sat, Nov 30, 2019 at 05:33:18PM -05

Re: [scikit-learn] scikit-learn twitter account

2019-11-30 Thread Nicolas Hug
Adrin also proposed Hi there. We've repurposed this account and it will be used for scikit-learn related announcements. To follow day to day progress on the repo, please follow @sklearn_commits. Both are fine with me. For maximum reach, maybe we could: 1. tweet the release announcement fr

Re: [scikit-learn] scikit-learn twitter account

2019-11-22 Thread Nicolas Hug
I agree @sklearn_commits should be OK, especially with name + bio + logo Funnily enough I have had the opposite experience: some people I talk to know sklearn, but not scikit-learn On 11/22/19 11:29 AM, Olivier Grisel wrote: Le ven. 22 nov. 2019 à 17:24, Gael Varoquaux a écrit : I would l

Re: [scikit-learn] scikit-learn twitter account

2019-11-15 Thread Nicolas Hug
What's the status of this? Would be great to have it for the 0.22 release :) ! On 11/5/19 8:24 AM, Chiara Marmo wrote: I'm 100% on re-purposing: people already follows the scikit_learn account. I'm not sure an account with commits will be really necessary... developers watch the github repo..

Re: [scikit-learn] Monthly meetings

2019-11-13 Thread Nicolas Hug
calendar or as calendar invitations for people who are likely to participate/were there last time) ? Cheers, Roman On 13/11/2019 23:14, Nicolas Hug wrote: Hey everyone, The next monthly meeting is on Monday! As usual, please be nice to the NYC people and *update your project notes before Frida

[scikit-learn] Monthly meetings

2019-11-13 Thread Nicolas Hug
Hey everyone, The next monthly meeting is on Monday! As usual, please be nice to the NYC people and *update your project notes before Friday* it'll be 7am for us :) Cheers, Nicolas https://github.com/scikit-learn/scikit-learn/projects/15

Re: [scikit-learn] scikit-learn twitter account

2019-11-04 Thread Nicolas Hug
I like the idea as well On 11/4/19 5:58 AM, Adrin wrote: sounds pretty good to me :) On Mon, Nov 4, 2019 at 10:51 AM Chiara Marmo > wrote: Hello everybody, I've taken a look to the last meeting minutes: talking about releases and sprint announcement

Re: [scikit-learn] scikit-learn Digest, Vol 43, Issue 38

2019-10-25 Thread Nicolas Hug
It's in the making for the new histogram-based GB estimators, but the other GB estimators like GradientBoostingRegressor and GradientBoostingClassifier already support sample_weight. Just pass the weights in the fit method: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.Grad

Re: [scikit-learn] using numpy repeat

2019-10-14 Thread Nicolas Hug
You're looking for np.tile. It's one of the first google results and it's also linked in the doc of np.repeat. This mailing-list is for questions related to scikit-learn. I think your question would be more appropriate for e.g. stack-overflow. On 10/14/19 1:55 PM, Glenn Schultz via scikit-le

Re: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features?

2019-10-04 Thread Nicolas Hug
But, decision tree is still mistaking one-hot-encoding as numerical input and split at 0.5. This is not right. Perhaps, I'm doing something wrong? You're not doing anything wrong, and neither is the tree. Trees don't support categorical variables in sklearn, so everything is treated as numeri

Re: [scikit-learn] Vote on SLEP009: keyword only arguments

2019-09-18 Thread Nicolas Hug
define a clear rule but doing a case-by-case seems better than bikeshedding now. Alexandre: did you read the SLEP before asking? I thought the point of the SLEP was to summarize the discussion. If your question is not answered we should amend the SLEP. On 9/11/19 2:21 PM, Nicolas Hug wrot

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-09-18 Thread Nicolas Hug
Hi everyone, Remainder that the next monthly meeting is on Monday! Please update your project notes *before Friday* so we don't have extra work on the WE :) https://github.com/scikit-learn/scikit-learn/projects/15

Re: [scikit-learn] Vote on SLEP009: keyword only arguments

2019-09-11 Thread Nicolas Hug
Since there is no explicit proposal in the SLEP it's not very clear what we need to vote for / against? But overall I'm  + 1 on forcing kwargs for all __init__ methods. Nicolas On 9/11/19 9:38 AM, Adrin wrote: Hi, I'm (mostly) the messenger, don't shoot me :P It may help to summarize the

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-08-26 Thread Nicolas Hug
Meeting is in 5 minutes everyone! Prepare to be np.random.choice'd  :) https://appear.in/amueller <https://www.google.com/url?q=https://appear.in/amueller&sa=D&ust=1566914386036000&usg=AOvVaw2rS1k5NlK35I-_dSoJLgt2> On 8/22/19 10:11 AM, Nicolas Hug wrote: Hi Everyon

Re: [scikit-learn] scikit-learn website and documentation

2019-08-22 Thread Nicolas Hug
Hi Chiara, Thanks for giving it a shot! I think we can end-up with a nice result with this theme too. Is this something you'd like to work on more seriously in the future, or just something to get you started on scikit-learn in general? (Basically, should Andy still be looking for a web-desi

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-08-22 Thread Nicolas Hug
Hi Everyone, Quick reminder that the next meeting is on Monday! *Please update your cards on the project board* so we can all have a look before the week-end. We decided to go for a "scrum-like" approach this time: quickly go through everyone's notes first, then discuss main issues. Anyone

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-08-05 Thread Nicolas Hug
Thanks everyone for joining, There's definitely room from improvement but this was still very productive I think :) The meeting notes are on the project board. I sent a google calendar invite to everyone for the next meeting: Monday 26th August, same time. If I missed you or if  you want me

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-08-02 Thread Nicolas Hug
I don't think this would be the place for long technical discussions. I was picturing something like "IHere's what I'm working on, here's the current status, and here's what needs to be decided". Then depending on the complexity, things can be briefly discussed, or we can just prioritize and a

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-08-01 Thread Nicolas Hug
. Thanks! Nicolas project board: https://github.com/scikit-learn/scikit-learn/projects/15 Meeting link: https://appear.in/amueller. On 7/26/19 2:08 PM, Nicolas Hug wrote: Thanks everyone for your feedback! Let's try to have a meeting on Monday 5th August, and then have meetings on the

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-07-26 Thread Nicolas Hug
Thanks everyone for your feedback! Let's try to have a meeting on Monday 5th August, and then have meetings on the last Monday of the month? Next meeting would be on August 26th. For the time: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=8&day=5&hour=13&min=0&sec

Re: [scikit-learn] Continues monitoring of benchmark performances

2019-07-22 Thread Nicolas Hug
I agree having benchmarks for non regression would be very helpful. A seemingly simple change in Cython code can lead to drastic performance drop. I can't find it back but I think Jérémie has submitted an issue about this? On 7/22/19 9:59 AM, Tom Augspurger wrote: Thanks Adrin, A month or so

[scikit-learn] Monthly meetings between core developers

2019-07-17 Thread Nicolas Hug
Hi Everyone, The scikit-learn team have been expanding significantly lately: we have now 3 FTEs in NY, 1 in Berlin, and 3 (soon 4) in Paris. To scale efficiently, I think we should try to communicate more. *I'd like to propose monthly meetings* *between the core-developers*. This would be th

Re: [scikit-learn] How is linear regression in scikit-learn done? Do you need train and test split?

2019-06-01 Thread Nicolas Hug
Splitting the data into train and test data is needed with any machine learning model (not just linear regression with or without least squares). The idea is that you want to evaluate the performance of your model (prediction + scoring) on a portion of the data that you did not use for trainin

Re: [scikit-learn] decision_path method for tree-based models

2019-05-23 Thread Nicolas Hug
Hi Anaël, yes feel free to submit a PR On 5/23/19 11:49 AM, Beaugnon Anael wrote: Hi everyone, The decision_path method is currently available only for DecisionTreeClassifier, DecisionTreeRegressor, and RandomForest, but not for IsolationForest and GradientBoostingClassifier. In these cases, th

[scikit-learn] Reddit thread with complaints about scikit-learn

2019-02-19 Thread Nicolas Hug
Hi everyone, I stumbled upon this reddit thread [1] where people point out what they dislike about the scikit-learn API. It's mostly about the lack of consistency for linear models. Just thought it'd be interesting to have some external critics. Best, Nicolas [1] https://www.reddit.com/r

Re: [scikit-learn] inconsistency across version

2019-02-15 Thread Nicolas Hug
There was a bug in 0.18 that was fixed here https://github.com/scikit-learn/scikit-learn/pull/9105 The results from 0.20 should be correct. It looks like you're still using Python 2, please be aware that *scikit-learn will drop support for python 2 in the next release*! Nicolas On 2/15/19 7

Re: [scikit-learn] Sprint discussion points?

2019-02-14 Thread Nicolas Hug
or we could go as far as to schedule meetings on the different topics. Given the number of issues to discuss this is probably the best approach IMO On 2/14/19 8:31 AM, Andreas Mueller wrote: As I said, I think it's too much and we need to prioritize. We could either rank issues and start wit

Re: [scikit-learn] Does sklearn contain xgboost?

2019-01-08 Thread Nicolas Hug
XGBoost is a specific implementation of gradient boosting trees, so strictly speaking scikit-learn cannot "contain" XGBoost. That being said: - XGBoost has a scikit-learn compatible API: https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn. So does LightGBM, a