Thanks Jeremie for pushing this release out!
Now is the time to test downstream projects against this to make sure
it will not break too many things when we publish the 1.2.0 final
release in a week or two !
--
Olivier
___
scikit-learn mailing list
Thank you so much Guillaume for getting this release out and to Chiara
for pushing forward with the Python 3.11 wheel building infrastructure
update and related fixes!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
BTW, this is now stable to the URL
https://scikit-learn.org/stable/whats_new/v1.1.html#version-1-1-1 also
works :)
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Thank you to all the contributors who reported bugs, minimal
reproducers and fixes, and thank you Guillaume for getting this bugfix
release out so timely \o/
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
I agree with Guillaume's answers.
I think it was a net benefit, even though it might be a bit annoying
to get the tooling right for first time contributors. We can probably
improve this by making the error messages on the CI more directive on
how to fix formatting issues by given copy-pastable
Congrats Jeremie and everybody who contributed to this release! This
is a great achievement.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Thanks Jeremie for leading the efforts to get this release out!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Maybe you can try to use faulthandler.dump_traceback_later
https://docs.python.org/3/library/faulthandler.html#faulthandler.dump_traceback_later
to get a traceback of all the threads of the main process.
But the fact that you are using the default `p =
multiprocessing.Pool()` makes me think that
To summarize, the office hours for today are:
- 15:00-16:00 UTC / 17:00-18:00 CEST (this one starts in less than 10min)
- 18:00-19:00 UTC / 20:00-21:00 CEST (with Guillaume)
Sorry for the confusion and see you soon.
--
Olivier
___
scikit-learn
Hi all,
Some of us will be online on the scikit-learn discord this Friday at
15:00 UTC and 20:00 UTC.
First time and occasional contributors are welcome to join us to
discord using this invitation link:
https://discord.gg/YBdN45kD
The focus of these office hour sessions is to answer questions
Yeah!
Thank you so much Adrin for all your efforts in getting this release out!
Congratulations everyone, time to celebrate!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Hi all,
This is an email to notify everybody interested that the discussion on
interoperability of Python dataframe libraries has moved to an
official repo under the data-apis.org initiative:
https://data-apis.org/blog/dataframe_protocol_rfc/
https://github.com/data-apis/dataframe-api
and they
Thanks for the heads up! This is interesting. We rarely update
dataframe values in-place in scikit-learn but this is interesting to
know that we could leverage this for more efficient pandas-in
pandas-out support, for instance for missing value imputation.
Many very active core devs not represented in the TC voted for 88 and
my previous vote for 79 was not that strong. So I feel that I should
now vote for 88:
Keep current 88 characters:
Olivier
Revert to 79 characters:
--
Olivier
___
scikit-learn
Dear all,
The scikit-learn developer monthly meeting will take place on Monday
June 28th at
3PM UTC.
- Video call link: https://meet.google.com/qbg-ucpe-ngz
- Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q
- Local times:
> I have only one question related to scikit-learn.
> how to compute topic coherence of lda models in scikit-lean. I don't find
> any function that calculate a coherence value.
> please, reply me.
We don't have such a metric in scikit-learn. I assume you are referring to:
I am a bit late but I am very happy to see Norbert joining the triage
team! Welcome!
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Alternatively, you can edit the code to use fetch_openml(...,
as_frame=False) to use a numpy array instead of a pandas dataframe for
this example.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
Please help us test the first release candidate for scikit-learn 0.24.0:
pip install scikit-learn==0.24.0rc1
Changelog: https://scikit-learn.org/0.24/whats_new/v0.24.html
In particular, if you maintain a project with a dependency on
scikit-learn, please let us know about any regression.
> Shall I contact them? Any other volunteers?
+1.
I think we are still dependent on travis for ARM-based release builds
and cron-jobs. The rest we can move it to Azure Pipelines or github
actions I believe.
--
Olivier
___
scikit-learn mailing list
Le mar. 13 oct. 2020 à 16:19, Adrin a écrit :
>
> Isn't the Boston dataset available through openml? Maybe here:
> https://www.openml.org/d/531
>
> I'm happy to have the dataset out there on opemml, and for any material that
> addresses some of the issues with it.
> But for educational
Thanks for your input, this is also an extension I was thinking of.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Hi all,
Thanks to the sustained effort of several contributors (thanks Maria
and Lucy in particular), the Boston housing price dataset is no longer
used in the examples of scikit-learn (nor in the test suite) in the
master branch.
To give some context on why this dataset is problematic, please
Shall we start rolling meetings with a switch between 2 or 3 time slots?
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Hi Sole,
I personally support climate change actions very much and I am
convinced climate change is the number 1 challenge of our time. In an
attempt to act in a consistent way with that belief, I declined
several times to keynote at conferences either organized by the fossil
fuel industry or to
Congrats on the release! And thank you very much to all those who were
involved in making it happen (and Adrin in particular)!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I get a message for an invalid meeting id.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
This is a minor release that includes many bug fixes and solves a
number of packaging issues with Windows wheels in particular. Here is
the full changelog:
https://scikit-learn.org/stable/whats_new/v0.22.html#version-0-22-1
The conda package will follow soon (hopefully).
Thank you very much to
Indeed I do not see the "circle add" button in the tweetdeck UI anymore.
But it's ok not to prepare the threads before tweeting the first
tweet. We can build the thread progressively by publishing the first
tweet and then replying one tweet after the other by hitting the reply
button of the last
Ok the twitter accounts are now switched:
https://twitter.com/scikit_learn/status/1201794032650932224
The notifications for commits pushed to master are live:
https://twitter.com/sklearn_commits
Ready for the release :)
--
Olivier
___
scikit-learn
Alright, I have configured the new github action for the tweets on
@sklearn_commits:
https://github.com/scikit-learn/scikit-learn/pull/15758
I tested it from my repo and it worked fine (I deleted the test tweet though).
We can do the switch as soon as this PR is merged.
--
Olivier
It might actually be possible to use github actions with
https://github.com/xorilog/twitter-action for instance. I will try to
give it a try with a test repo.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
Alright, it seems that I can create twitter apps (and generates API
tokens) for the @sklearn_commits account however
https://github.com/filearts/tweethook does not work as it relies on a
third party webtask,io service that does not accept any new
subscription...
I am looking for an alternative
I have created the https://twitter.com/sklearn_commits twitter account.
I have applied to make this account a "Twitter Developer" account to
be able to use https://github.com/filearts/tweethook to register it as
a webhook for the main scikit-learn github repo.
Once ready, I will remove the old
Le ven. 22 nov. 2019 à 17:24, Gael Varoquaux
a écrit :
>
> > I would like to create @sklearn_commits instead of
> > @scikit_learn_commits that is too long to my taste. Any opinion?
>
> Some people do not make the link between "sklearn" and "scikit-learn" :)
People who are likely to follow a
Ok, I have sent some invites.
I would like to create @sklearn_commits instead of
@scikit_learn_commits that is too long to my taste. Any opinion?
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
Thanks Tom, let me try to configure this.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I am not sure who has the rights to manage the twitter account. I just
sent a password reset request to "sc**@a..***"
I suspect that this is Andreas but I am not so sure.
___
scikit-learn mailing list
scikit-learn@python.org
Le ven. 15 nov. 2019 à 17:31, Nicolas Hug a écrit :
>
> What's the status of this? Would be great to have it for the 0.22 release :) !
>
+1 and we could also announce / thank / RT new sources of funding (CZI
and Fujitsu).
___
scikit-learn mailing list
Le mar. 5 nov. 2019 à 12:46, Gael Varoquaux
a écrit :
>
> On Mon, Nov 04, 2019 at 10:14:26PM -0700, Andreas Mueller wrote:
> > Should we re-purpose the existing twitter account or make a new one?
> > https://twitter.com/scikit_learn
>
> I think that we should repurpose it:
>
> - Make a
I just found this planner to give it a try:
https://www.timeanddate.com/worldclock/meetingtime.html?day=29=7=2019=240=33=37=179=0
(Berlin and Paris are on the same timezone so I did not put only Berlin).
It's going to be challenging to find a timeslot for every body. The
least extreme timeslot
Le jeu. 18 juil. 2019 à 08:29, Adrin a écrit :
>
> BTW, where was the meeting for last Monday organized? I don't think I knew it
> was happening.
I do not understand what you are referring to. My email was about the
organization of future meetings as suggested by Andreas.
The core developers of Scikit-learn have recently voted to welcome
Jérémie Du Boisberranger to the team, in recognition of his efforts
and trustworthiness as contributor. Jérémie's works at Inria Saclay
and is supported by the scikit-learn initiative at Fondation Inria and
its partners.
You have to use a dedicated framework to distribute the computation on a
cluster like you cray system.
You can use mpi, or dask with dask-jobqueue but the also need to run
parallel algorithms that are efficient when running in a distributed with a
high cost for communication between distributed
How many cores du you have on this machine?
joblib.cpu_count()
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I think it's ok to do as you said.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
\o/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
+1
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I would also add generalizing early stopping options to most estimators.
This is a bit related to Joel's point on max_iter consistency in
LogisticRegression.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
say that they
> >> > might be available at this time. It is good for many people, or
> should we
> >> > organize a doodle?
> >> >
> >> > G
> >> >
> >> > On Wed, Dec 19, 2018 at 05:27:21PM -0500, Andreas Mueller wrote:
> &
You should probably just "conda update scikit-learn":
scikit-learn 0.20.1 is available on the official anaconda channel for all
supported operating systems:
https://anaconda.org/anaconda/scikit-learn
--
Olivier
___
scikit-learn mailing list
They are very different statistical models from a mathematical point of
view. See the online scikit-learn documentation or reference text books
such as "Elements of Statistical Learning" for more details.
In practice, linear model tends to be faster to fit on large data,
especially when the
Congrats and welcome Adrin!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Maybe a subset of the criteo TB dataset?
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
We can also do Paris in April / May or June if that's ok with Joel and
better for Andreas.
I am teaching on Fridays from end of January to March. But I can miss half
a day of sprint to teach my class.
--
Olivier
___
scikit-learn mailing list
You might also want to have a look at https://github.com/onnx/onnxmltools
although I am not sure if there are RF optimized ONNX runtimes at this
point.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
>
>
> > I think model serialization should be a priority.
>
There is also the ONNX specification that is gaining industrial adoption
and that already includes open source exporters for several families of
scikit-learn models:
https://github.com/onnx/onnxmltools
--
Olivier
Le mer. 26 sept. 2018 à 23:02, Joel Nothman a
écrit :
> And for those interested in what's in the pipeline, we are trying to draft
> a roadmap...
> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
>
> But there are no doubt many features that are absent there too.
>
Indeed,
Joy !
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I believe it would fit in sklearn-contrib even if it's more for statistical
inference rather than machine learning style prediction.
Others might disagree.
Anyways, joining efforts to improve documentation, CI, testing and so on is
always a good thing for your future users.
--
Olivier
This looks like a very useful project.
There is also scikits-bootstraps [1]. Personally I prefer the flat package
namespace of resample (I am not a fan of the 'scikits' namespace package)
but I still think it would be great to contact the author to know if he
would be interested in joining
Hi everyone!
Let's welcome Joris Van den Bossche (@jorisvdbossche) officially as a
scikit-learn core developer!
Joris is one of the maintainers of the pandas project and recently
contributed many new great PRs to scikit-learn (notably the
ColumnTransformer and a refactoring of the categorical
It looks nice, thanks for sharing.
Do you plan to couple the active learner with a UX-optimized labeling
interface (for instance with a react.js or similar frontend and a flask or
similar backend)?
--
Olivier
___
scikit-learn mailing list
Have you had a look at BIRCH?
http://scikit-learn.org/stable/modules/clustering.html#birch
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Interesting project!
BTW, do you know about dask-ml [1]?
It might be interesting to think about generalizing the input validation of
fit and predict / transform as a private method of the BaseEstimator class
instead of directly calling into sklearn.utils.validation functions so has
to make it
Maybe update your version of Cython?
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
> Do I need to write object oriented or are functions also ok?
I you want to contribute an implementation as a new project on scikit-learn
contrib, you should be careful to follow the scikit-learn estimators API:
Congrats to all three of you! Thank you very much for your contributions
and in particular in reviewing contributions by others.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
+1 for python.org if they accept this kind of mailing lists.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
+1
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Grab it with pip or conda !
Quoting the release highlights from the website:
We are excited to release a number of great new features including
neighbors.LocalOutlierFactor for anomaly detection,
preprocessing.QuantileTransformer for robust feature transformation, and
the
I have no idea whether the randomized SVD method is supposed to work for
complex data or not (from a mathematical point of view). I think that all
scikit-learn estimators assume real data (or integer data for class labels)
and our input validation utilities will cast numeric values to float64 by
I believe so even though it's always better to check in the code to see how
this parameter is actually used.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
The new release is coming and we are seeking feedback from beta testers!
pip install scikit-learn==0.19b2
conda-forge packages should follow in the coming hours / days.
Note that many models have changed behaviors and some things have been
deprecated, see the full changelog at:
The name of the algorithm / model would be "L2-penalized linear model
with modified Huber loss trained with Stochastic Gradient Descent".
SVM is traditionally used to describe models that use the hinge loss
only (or sometimes the squared hinge loss too).
Only the log loss can be lead to a
I think the documentation is correct. U, a.k.a. "the code" or "the
activations" has shape (n_samples, n_components) and V a.k.a. "the
dictionary" or "the components" has shape (n_components, n_features) in
both case.
We could use n_components uniformly instead of n_atoms for consistency's
sake
I am pretty sure this is exactly the kind of presentation that the
EuroScipy audience would enjoy. Please submit!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Thanks for this report!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
You can have a look at the test named "test_agglomerative_clustering" in:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/tests/test_hierarchical.py
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
Hi all,
FYI I have just submitted a 90 min tutorial on scikit-learn to the
EuroScipy CFP. If anybody is interested in co-teaching / TA-ing this
workshop please let me know.
I also plan to stay for the one-day sprint to help people make their
first contribution to the project. Last year we had
Please provide the full traceback. Without it it's impossible to tell
whether the problem is in scikit-learn or xgboost.
Also, please provide a minimal reproduction script as explained in:
http://scikit-learn.org/stable/faq.html#what-s-the-best-way-to-get-help-on-scikit-learn-usage
--
Olivier
>From a generalization point of view (test accuracy), the optimal
sparsity support should not matter much though, but it can be helpful
to find a the optimally sparsest solution for either computational
constraints (smaller models with a lower prediction latency) and
interpretation of the weights
Note that SGD is not very good at optimizing finely with a non-smooth
penalty (e.g. l1 or elasticnet). The future SAGA solver is going to be
much better at finding the optimal sparsity support (although this
support is not guaranteed to be stable across re-sampling of the
training set if the
Personally I don't feel like mentoring this year. I would really like
to focus my scikit-learn time on finishing the joblib process
refactoring with Thomas Moreau and the binning / thread-based
parallelization of boosted trees with Guillaume and Raghav.
--
Olivier
I don't think we have any model dedicated to this, but it's possible
that expressive non-parametricmodels such as RF and GBRT or richly
parameterized models such as MLP with a regression loss can do a good
enough job at giving you a point estimate.
--
Olivier
I would rather like to get it out before April ideally and instead of
setting up a roadmap I would rather just identify bugs that are
blockers and fix only those and don't wait for any feature before
cutting 0.19.X.
--
Olivier
___
scikit-learn mailing
In retrospect, making a small 0.19 release is probably a good idea.
I would like to get
https://github.com/scikit-learn/scikit-learn/pull/8002 in before
cutting the 0.19.X branch.
--
Olivier Grisel
___
scikit-learn mailing list
scikit-learn@python.org
Hi all,
I think we should release 0.18.2 to get some important fixes and make
it easy to release Python 3.6 wheel package for all the operating
systems using the automated procedure.
I identified a couple of PR to backport to 0.18.X to prepare the
0.18.2 release. Are there any other important
You can indeed derive from BaseEstimator and implement fit, predict
and optionally score.
Here is the documentation for the expected estimator API:
http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects
As this is a linear regression model, you can also want to
I cannot reproduce such a degradation on my machine:
(sklearn-0.17)ogrisel@is146148:~/code/scikit-learn$ python
~/tmp/bench_vectorizer.py
scikit-learn 0.17.1. Numpy 1.11.2. Python 3.5.0 x86_64
Vectorizing 20newsgroup 11314 documents
Vectorization completed in 4.033604383468628 seconds,
BTW Roman, the examples in your gist would make a great non-regression
test for this new feature. Please feel free to submit a PR.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Sorry for the late reply,
Before working on this release I would like to automate the wheel
generation process (for the release wheels) in a single repo that will
generate wheels for linux, osx and windows based on
https://github.com/matthew-brett/multibuild
I plan to put that repo under
> I believe this `arch -i386` only works as a prefix for Python.org Python,
> but I'm happy to be corrected.
Then the following should work:
arch -i386 python -c "import nose; nose.main()" sklearn
___
scikit-learn mailing list
scikit-learn@python.org
I think it could be implemented as a preprocessing step: this is the
approach followed by:
https://github.com/ryankiros/skip-thoughts/blob/master/eval_classification.py
Note that in that case LogisticRegression is used as the final
classifier instead of a squared hinge loss SVM but that should
94 matches
Mail list logo