Congratulations guys! Great work! Looking forward to much more! Proud to
have you on the team!
Now we in NYC can approve our own pull requests ;)
Sent from phone. Please excuse spelling and brevity.
On Wed, Apr 3, 2019, 21:08 Hanmin Qin wrote:
> Congratulations and welcome to the team!
>
>
Congratulations and welcome to the team!
Hanmin Qin
- Original Message -
From: Joel Nothman
To: Scikit-learn user and developer mailing list
Subject: [scikit-learn] New core developers: thomasjpfan and nicolashug
Date: 2019-04-04 07:52
The core developers of Scikit-learn have recently
The core developers of Scikit-learn have recently voted to welcome
Thomas Fan and Nicolas Hug to the team, in recognition of their
efforts and trustworthiness as contributors. Both happen to be working
with Andy Mueller at Columbia University at the moment.
Congratulations and thanks to them both!
This is not a strongly-held suggestion - but what about adopting
YellowBrick as the plotting API for sklearn? Not sure how exactly the
interaction would work - could be PRs to their library, or ask them to
integrate into sklearn, or do a lock-step dance with versions but maintain
separate teams?
Pull requests improving the documentation are always welcome. At a minimum,
users need to know that these compute different things.
Accuracy is not precision. Precision is the number of true positives
divided by the number of true positives plus false positives. It therefore
cannot be decomposed
Hi everyone, this is my first post here :)
About two weeks ago, due to the low demand in my project, I have been
assigned a completely unusual request: to automatically extract answers
from documents based on machine learning. I've never read anything about
ML, AI or NLP before, so I've been
Am 03.04.19 um 13:59 schrieb Joel Nothman:
The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.
For a metric like MSE it will be almost identical assuming the test
sets have almost the same size.
What will
With option 1, sklearn.plot is likely to import large chunks of the
library (particularly, but not exclusively, if the plotting function
"does the work" as Andy suggests). This is under the assumption that
one plot function will want to import trees, another GPs, etc. Unless
we move to lazy
On Wed, Apr 03, 2019 at 08:54:51AM -0400, Andreas Mueller wrote:
> If the loss decomposes, the result might be different b/c of different test
> set sizes, but I'm not sure if they are "worse" in some way?
Mathematically, a cross-validation estimates a double expectation: one
expectation on the
I think what was not clear from the question is that there is actually
quite different kinds of plotting functions, and many of these are tied
to existing code.
Right now we have some that are specific to trees (plot_tree) and to
gradient boosting (plot_partial_dependence).
I think we want
On 4/3/19 7:59 AM, Joel Nothman wrote:
The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.
For a metric like MSE it will be almost identical assuming the test
sets have almost the same size. For
The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.
For a metric like MSE it will be almost identical assuming the test
sets have almost the same size. For something like Recall
(sensitivity) it will be
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting
functions will be added? If it's just a dozen or less, putting them all
into a single namespace sklearn.plot might be easier.
This also would avoid discussion about where to put some generic
plotting functions (e.g.
I use
sum((cross_val_predict(model, X, y) - y)**2) / len(y) (*)
to evaluate the performance of a model. This conforms with Murphy:
Machine Learning, section 6.5.3, and Hastie et al: The Elements of
Statistical Learning, eq. 7.48. However, according to the documentation
of
Hi,
that does not really sound like a clustering but more like a preprocessing
problem to me. For each item you want to calculate the length of the
longest subsequence of "1"s. That could be done by a simple function and
would create a new (one-dimensional) property for each of your items.
You
I think #1 if any of these... Plotting functions should hopefully be as
general as possible, so tagging with a specific type of estimator will, in
some scikit-learn utopia, be unnecessary.
If a general plotter is built, where does it live in other
estimator-specific namespace options? Feels
My preference would be for (1). I don't think the sub-namespace in (2) is
necessary, and don't like (3), as I would prefer the plotting functions to
be all in the same namespace sklearn.plot.
Andrew
<~~~>
J. Andrew Howe, PhD
LinkedIn Profile
I have data which contain access duration of each items.
EX: t0~t4 is the access time duration. 1 means the item was accessed in the
time duration, 0 means not.
ID,t0,t1,t2,t3,t4
0,1,0,0,1
1,1,0,0,1
2,0,0,1,1
3,0,1,1,1
What I want to cluster is the length of continuous duration
Ex:
ID=3 > 2 > 1
18 matches
Mail list logo