Re: [scikit-learn] Analysis of sklearn and other python libraries on github by MS team

2020-03-27 Thread Roman Yurchak
Very interesting! A few comments, > From GH17, we managed to extract only 10.5k pipelines. The relatively low frequency (with respect to the number of notebooks using SCIKIT-LEARN [..]) indicates a non-wide adoption of this specification. However, the number of pipelines in the GH19 corpus is

Re: [scikit-learn] A basic question about kmeans algorithms elkan and llyod

2020-03-27 Thread Andreas Mueller
There's an interesting analysis in this paper: Fast K-Means with Accurate Bounds http://proceedings.mlr.press/v48/newling16.pdf On 3/26/20 3:40 AM, Alexandre Gramfort wrote: hi, I suspect Elkan is really winning when you have many centroids so the conclusion is not systematic my 2c Alex On

[scikit-learn] Analysis of sklearn and other python libraries on github by MS team

2020-03-27 Thread Andreas Mueller
Hey all. There's a pretty cool paper by a team at MS that analyses public github repos for their use of the sklearn and related libraries: https://arxiv.org/abs/1912.09536 Thought it might be of interest. Cheers, Andy ___ scikit-learn mailing list sc