--
Sebastian Raschka, PhD
Machine learning and AI researcher, https://sebastianraschka.com
Staff Research Engineer at Lightning AI, https://lightning.ai
On May 28, 2024 at 9:43 AM -0500, Sole Galli via scikit-learn
, wrote:
> Hi guys,
>
> I'd like to understand why sklearn
Awesome news! Congrats Tim!
Cheers,
Sebastian
On Mar 8, 2023, 8:35 AM -0600, Ruchika Nayyar , wrote:
> Congratulations Tim! Good to see you virtually :)
>
> Thanks,
> Ruchika
>
>
> Dr. Ruchika Nayyar
> Data Scientist, Greene Tweed & Co.
>
>
> > On Wed, Mar 8, 2023 at 5:09
A 1.0 release is huge, and this is really awesome news! Very exciting! Congrats
to the scikit-learn team and everyone who helped making this possible!
Cheers,
Sebastian
On Sep 24, 2021, 11:40 AM -0500, Adrin , wrote:
> Hi everyone,
>
> We're happy to announce the 1.0 release which you can install
The R2 function in scikit-learn works fine. A negative means that the
regression model fits the data worse than a horizontal line representing the
sample mean. E.g. you usually get that if you are overfitting the training set
a lot and then apply that model to the test set. The econometrics book
This is really awesome news! Thanks a lot to everyone developing scikit-learn.
I am just wrapping up another successful semester, teaching students ML basics.
Most coming from an R background, they really loved scikit-learn and
appreciated it's ease of use and well-thought-out API.
Best,
Sebast
Hi Anna,
You can set shuffle=False (it's set to True by default in the
make_classification function). Then, the resulting features will be sorted as
follows: X[:, :n_informative + n_redundant + n_repeated]. I.e., if you set
“n_features=1000” and “n_informative=20”, the first 20 features will b
Hi there,
unfortunately I currently don't have time to walk through your example, but I
wrote down how the Tf-idf in sklearn works using some examples here:
https://github.com/rasbt/pattern_classification/blob/90710922e4f4d7e3f432221b8a4d2ec1dd2d9dc9/machine_learning/scikit-learn/tfidf_scikit-le
Hi Peng,
check out
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/_stop_words.py
Best,
Sebastian
> On Jan 27, 2020, at 2:30 PM, Peng Yu wrote:
>
> Hi,
>
> I don't see what stopwords are used by CountVectorizer with
> stop_wordsstring = ‘english’.
>
> htt
I think that a twitter account for scikit-learn would be awesome. I could
envision it for announcements (new features, package releases, etc.), but it
would be cool to share interesting applications of scikit-learn, upcoming
events (tutorials, conference talks) as well -- somewhat similar to wha
Hi Bulbul,
I would rather say SGD is a method for optimizing the objective function of
certain ML models, or optimize the loss function of certain ML models / learn
the parameters of certain ML models.
Best,
Sebastian
> On Oct 28, 2019, at 4:00 PM, Bulbul Ahmmed via scikit-learn
> wrote:
>
igure?).
>
>
> On 10/6/19 10:40 AM, Sebastian Raschka wrote:
>> Sure, I just ran an example I made with graphviz via plot_tree, and it looks
>> like there's an issue with overlapping boxes if you use class (and/or
>> feature) names. I made a reproducible example here so
_tree/tree-demo-1.ipynb
Happy to add this to the sklearn issue list if there's no issue filed for that
yet.
Best,
Sebastian
> On Oct 6, 2019, at 9:10 AM, Andreas Mueller wrote:
>
>
>
> On 10/4/19 11:28 PM, Sebastian Raschka wrote:
>> The docs show a way such that yo
g on your computer.
> That's a lot work for just one plot. Is there something like a matplotlib?
>
> Thanks!
>
> On Fri, Oct 4, 2019 at 9:42 PM Sebastian Raschka
> wrote:
> Yeah, think of it more as a computational workaround for achieving the same
> thing more eff
think I get it.
>
> It's just have never seen it this way. Quite different from what I'm used in
> Elements of Statistical Learning.
>
> On Fri, Oct 4, 2019 at 7:13 PM Sebastian Raschka
> wrote:
> Not sure if there's a website for that. In any case, to explain
#x27;t understand your answer.
>
> Why after one-hot-encoding it still outputs greater than 0.5 or less than?
> Does sklearn website have a working example on categorical input?
>
> Thanks!
>
> On Fri, Oct 4, 2019 at 3:48 PM Sebastian Raschka
> wrote:
> Like Nicolas s
>> and split at 0.5. This is not right. Perhaps, I'm doing something wrong?
>>
>> Is there a good toy example on the sklearn website? I am only see this:
>> https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html
>> <https
Hi,
> The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1,
> Audi=2) as numerical values, not category.The tree splits at 0.5 and 1.5
that's not a onehot encoding then.
For an Audi datapoint, it should be
BMW=0
Toyota=0
Audi=1
for BMW
BMW=1
Toyota=0
Audi=0
and for Toyota
ionTreeClassifier()?
>
> Best,
>
> Mike
>
> On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka
> wrote:
> Hi,
>
> if you have the category "car" as shown in your example, this would
> effectively be something like
>
> BMW=0
> Toyota=1
> A
Hi,
if you have the category "car" as shown in your example, this would effectively
be something like
BMW=0
Toyota=1
Audi=2
Sure, the algorithm will execute just fine on the feature column with values in
{0, 1, 2}. However, the problem is that it will come up with binary rules like
x_i>= 0.5,
Hi Ben,
I can recall seeing convergence warnings for scikit-learn's logistic regression
model on datasets in the past as well. Which solver did you use for
LogisticRegression in sklearn? If you haven't done so, have used the lbfgs
solver? I.e.,
LogisticRegression(..., solver='lbfgs')?
Best,
S
roblem that I would have to have that custom estimator defined on the Cloud
> ML end, which I'm unsure how to do.
>
> Thanks,
> Liam
>
> On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka
> wrote:
> Hi Liam,
>
> not sure what your exact error message is, but it may
Hi Liam,
not sure what your exact error message is, but it may also be that the
XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns
sparse arrays. You could probably fix your issues by inserting a
"DenseTransformer" into your pipelone (a simple class that just transform
Hi Andreas,
the best score is determined by computing the test fold performance (I think
R^2 by default) and then averaging over them. Since you chose cv=10, you have
10 test folds, and the performance is the average performance over those for
choosing the best hyper parameter setting.
Then,
It's not necessarily unique to stochastic gradient descent, it's more that some
other algorithms are generally not well suited for "partial_fit". For SGD,
partial fit is a more natural thing to do since you estimate the training loss
from minibatches anyway -- i.e., you do SGD step by step anywa
ibution independent and doesn't need
> bootstrapping, so it looks indeed quite nice.
>
>
> On 2/6/19 1:19 PM, Sebastian Raschka wrote:
> > Hi Stuart,
> >
> > I don't think so because there is no standard way to compute CI's. That
> > go
Hi Stuart,
I don't think so because there is no standard way to compute CI's. That goes
for all performance measures (accuracy, precision, recall, etc.). Some people
use simple binomial approximation intervals, some people prefer bootstrapping
etc. And it also depends on the data you have. In l
12:52 AM, lampahome wrote:
>
>
>
> Sebastian Raschka 於 2019年2月1日 週五 下午1:48寫道:
> Hi there,
>
> if you call the "fit" method, the learning will essentially start from
> scratch. So no, it doesn't consider previous training results.
> However, certain alg
Hi there,
if you call the "fit" method, the learning will essentially start from scratch.
So no, it doesn't consider previous training results.
However, certain algorithms are implemented with an additional partial_fit
method that would consider previous training rounds.
Best,
Sebastian
> On
t; array(['American', 'Southwest'], dtype=object)
>
>
>
> On Tue, Jan 8, 2019 at 9:51 AM pisymbol wrote:
> If that is the case, what order are the coefficients in then?
>
> -aps
>
> On Tue, Jan 8, 2019 at 12:48 AM Sebastian Raschka
> wrote:
>
E.g, if you have a feature with values 'a' , 'b', 'c', then applying the one
hot encoder will transform this into 3 features.
Best,
Sebastian
> On Jan 7, 2019, at 11:02 PM, pisymbol wrote:
>
>
>
> On Mon, Jan 7, 2019 at 11:50 PM pisymbol wrote:
> According to the doc (0.20.2) the coef_ vari
Maybe check
a) if the actual labels of the training examples don't start at 0
b) if you have gaps, e.g,. if your unique training labels are 0, 1, 4, ..., 23
Best,
Sebastian
> On Jan 7, 2019, at 10:50 PM, pisymbol wrote:
>
> According to the doc (0.20.2) the coef_ variables are suppose to be s
I think it refers to the test folds via the k-fold cross-validation that is
internally used via the `cv` parameter of GridSearchCV (or the test folds of an
alternative cross validation scheme that you may pass as an iterator to cv)
Best,
Sebastian
> On Jan 3, 2019, at 9:44 PM, lampahome wrote:
I would like to make a related suggestion but instead of focusing on the upper
bound for the number of trees rather set choosing the lower bound. From a
theoretical perspective, it doesn't make sense to me how fewer trees can result
in a better performing random forest model in terms of generali
Say n is the number of examples and m is the number of features, then a naive
implementation of a balanced binary decision tree is O(m * n^2 log n). I think
scikit-learn's decision tree cache the sorted features, so this reduces to O(m
* n log n). Than, to your O(m * n log n) you can multiply th
Hi Rui,
I agree with Joel that association rule mining could be a bit tricky to fit
nicely within the scikit-learn API. Maybe this could be some transformer class?
I thought about that a few years ago but remember that I couldn't come up with
a good solution at that point.
In any case, I have
Also want to say that I really welcome this decision/change. Personally, as far
as I am aware, I've trying been using keyword arguments consistently for years,
except for cases where it is really obvious, like .fit(X_train, y_train), and I
believe that it really helped me regarding writing less
ki wrote:
> Just a small side note that I've come across with Random Forests which in the
> end form an ensemble of Decision Trees. I ran a thousand iterations of RFs on
> multi-label data and managed to get a 4-10 percentage points difference in
> subset accuracy, depending o
nt?
>
> I’d at least try that before diving into the source code...
>
> Cheers,
>
> --
> Julio
>
>> El 28 oct 2018, a las 2:24, Sebastian Raschka
>> escribió:
>>
>> Thanks, Javier,
>>
>> however, the max_features is n_features by def
z wrote:
>
> Hi Sebastian,
>
> I think the random state is used to select the features that go into each
> split (look at the `max_features` parameter)
>
> Cheers,
> Javier
>
> On Sun, Oct 28, 2018 at 12:07 AM Sebastian Raschka
> wrote:
> Hi all,
>
Hi all,
when I was implementing a bagging classifier based on scikit-learn's
DecisionTreeClassifier, I noticed that the results were not deterministic and
found that this was due to the random_state in the DescisionTreeClassifier
(which is set to None by default).
I am wondering what exactly t
The ONNX-approach sounds most promising, esp. because it will also allow
library interoperability but I wonder if this is for parametric models only and
not for the nonparametric ones like KNN, tree-based classifiers, etc.
All-in-all I can definitely see the appeal for having a way to export skl
This is explained here
http://scikit-learn.org/stable/modules/ensemble.html#random-forests:
"In addition, when splitting a node during the construction of the tree, the
split that is chosen is no longer the best split among all features. Instead,
the split that is picked is the best split among
>
> > I think model serialization should be a priority.
>
> There is also the ONNX specification that is gaining industrial adoption and
> that already includes open source exporters for several families of
> scikit-learn models:
>
> https://github.com/onnx/onnxmltools
Didn't know about that
Congrats everyone, this is awesome!!! I just started teaching an ML course this
semester and introduced scikit-learn this week -- it was a great timing to
demonstrate how well maintained the library is and praise all the efforts that
go into it :).
> I think model serialization should be a pri
Hi all,
first of all, I think that having more feature selection capabilities in
scikit-learn would be nice, especially, an algorithm from the wrapper category
that also regards dependence/interaction between features.
Regarding the SequentialFeatureSelection class... We actually decided to
si
That's awesome! Congrats and thanks everyone for all the work that went into
this!
Just finished reading through the What's New docs... Wow, that took a while --
here, in a positive sense ;). It's a huge release with lots of important fixes.
It's great to see that you prioritized the maintenanc
Hi Debu,
since Azure HDInsights is a commercial service, their customer support should
handle questions like this
> On Aug 12, 2018, at 7:16 AM, Debabrata Ghosh wrote:
>
> Hi All,
>Greetings ! Wish you are doing good ! I am just
> reaching out to you in case if you hav
Hi,
scikit-learn doesn't support computations on the GPU, unfortunately.
Specifically for random forests, there's CudaTree, which implements a GPU
version of scikit-learn's random forests. It doesn't look like the library is
actively developed (hard to tell whether that's a good thing or a bad
I am not a core dev, but I think I can see what's wrong there (mostly Flake8
issues). Let me comment about that over there.
> On Jul 24, 2018, at 7:34 PM, Prathusha Jonnagaddla Subramanyam Naidu
> wrote:
>
> This is the link to the PR -
> https://github.com/scikit-learn/scikit-learn/pull/1167
I addition to checking _n_iter and fixing the random seed as I suggested maybe
also try normalizing the features (eg z scores via the standard scale we) to
see if that stabilizes the training
Sent from my iPhone
> On Jul 24, 2018, at 1:07 PM, Benoît Presles
> wrote:
>
> I did the same tests
Agreed. But then the setting is c=1e9 in this context (where C is the inverse
regularization strength), so the regularization effect should be very small.
Probably shouldn't matter much for convex optimization, but I would still try
to
a) set the random_state to some fixed value
b) make sure
That's great news! I am glad to hear that you joined the project, Joris Van den
Bossche! I am a scikit-learn user (and sometimes contributor) and really
appreciate all the time and effort that the core developers and contributors
spend on maintaining and extending the library.
Best regards,
S
Hi Jeff,
had a similar question 1-2 years ago and ended up using Chris Borgelt's C
command line tools but for convenience, i also implemented basic association
rule & frequent pattern mining in Python here:
http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/
Best,
Seb
Hi,
> I quickly read about multinomal regression, is it something do you recommend
> I use? Maybe you think about something else?
Multinomial regression (or Softmax Regression) should give you results somewhat
similar to a linear SVC (or logistic regression with OvO or OvR). The
theoretical d
sorry, I had a copy & paste error, I meant "LogisticRegression(...,
multi_class='multinomial')" and not "LogisticRegression(...,
multi_class='ovr')"
> On Jun 3, 2018, at 5:19 PM, Sebastian Raschka
> wrote:
>
> Hi,
>
>> I
> So I suggest that there is a test version that shows a proper message when an
> error occurs.
I think the freezing that happens in your case is operating system specific and
it would require some weird workarounds to detect at which RAM usage the
combination of machine and operating system mi
Not sure how it compares in practice, but it's certainly more efficient to rank
the features by impurity decrease rather than by OOB permutation performance
you wouldn't need to
a) compute the OOB performance (an extra pass inference step)
b) permute a feature column and do another inference pas
Dear Wouter,
for the SVM, scikit-learn wraps the LIBSVM and LIBLINEAR. I think the
scikit-learn class SVC uses LIBSVM for every kernel. Since you are using the
linear kernel, you could use the more efficient LinearSVC scikit-learn class to
get similar results. I guess this in turn is easier to
That's a good question since the outputs would be differently scaled if the
logistic sigmoid vs the softmax is used in the output layer. I think you don't
need to worry about setting anything though, since the "activation" only
applies to the hidden layers, and the softmax is, regardless of "act
Hi,
If you want to predict the Kmeans cluster membership, you can use Kmeans'
predict method instead of training a KNN model on the cluster assignments. This
will be computationally more efficient and give you the correct assignment at
the borders between clusters.
Best,
Sebastian
> On Mar 12,
Like Guillaume suggested, you don't want to load the whole array into memory if
it's that large. There are many different ways for how to deal with this. The
most naive way would be to break up your NumPy array into smaller NumPy array
and load them iteratively with a running accuracy calculatio
nt Kendall's tau correlation coefficient and a combination of R, tau
> and RMSE. :)
>
> On Mar 1, 2018 15:49, "Sebastian Raschka" wrote:
> Hi, Thomas,
>
> as far as I know, it's all the same and doesn't matter, and you would get the
> same splits, sinc
the
> impurities of the left and right split? In MSE class they are (sum_i^n
> y_i)**2 where n is the number of samples in the respective split. It is not
> clear how this is related to variance in order to adapt it for my purpose.
>
> Best,
> Thomas
>
>
> On Mar
Hi, Thomas,
in regression trees, minimizing the variance among the target values is
equivalent to minimizing the MSE between targets and predicted values. This is
also called variance reduction:
https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction
Best,
Sebastian
> On Mar 1
Inertia simply means the sum of the squared distances from sample points to
their cluster centroid. The smaller the inertia, the closer the cluster members
are to their cluster centroid (that's also what KMeans optimizes when choosing
centroids). In this context, the elbow method may be helpful
Hi,
by default, the clustering classes from sklearn, (e.g., DBSCAN), take an
[num_examples, num_features] array as input, but you can also provide the
distance matrix directly, e.g., by instantiating it with metric='precomputed'
my_dbscan = DBSCAN(..., metric='precomputed')
my_dbscan.fit(my_dis
Good point Joel, and I actually forgot that you can set the norm param in the
TfidfVectorizer, so one could basically do
vect = TfidfVectorizer(use_idf=False, norm='l1')
to have the CountVectorizer behavior but normalizing by the document length.
Best,
Sebastian
> On Jan 28, 2018, at 1:29 AM,
Hi, Yacine,
Just on a side note, you can set idf=False in the Tfidf and only normalize the
vectors by their L2 norm.
But yeah, the normalization you suggest might be really handy in certain cases.
I am not sure though if it's worth making this another parameter in the
CountVectorizer (which al
As far as I know, no. But you could simply truncate the iris dataset for binary
classification, e.g.,
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:100]
y = iris.target[:100]
Best,
Sebastian
> On Dec 3, 2017, at 3:54 PM, Peng Yu wrote:
>
> Hi, iris is a three-class
Independent from the implementation, and unless you use the 'centroid' or
'average linkage' method, cluster centroids don't need to be computed when
performing the agglomerative hierarchical clustering . But you can always
compute it manually by simply averaging all samples from a cluster (for e
Oh, never mind my previous email, because while the components should be the
same, the projection of the data points onto those components would still be
affected by centering vs non-centering I guess.
Best,
Sebastian
> On Oct 16, 2017, at 3:25 PM, Sebastian Raschka wrote:
>
> Hi
Hi,
if you compute the principal components (i.e., eigendecomposition) from the
covariance matrix, it shouldn't matter whether the data is centered or not,
since the covariance matrix is computed as
CovMat = \fact{1}{n} \sum_{i=1}^{n} (x_n - \bar{x}) (x_n - \bar{x})^T
where \bar{x} = vector o
me reason I thought we had a "prefit" parameter.
>
> I think we should.
>
>
>> On 10/01/2017 07:39 PM, Sebastian Raschka wrote:
>> Hi, Rares,
>>
>>> vc = VotingClassifier(...)
>>> vc.estimators_ = [e1, e2, ...]
>>> vc.le_ = ..
Hi, Rares,
> vc = VotingClassifier(...)
> vc.estimators_ = [e1, e2, ...]
> vc.le_ = ...
> vc.predict(...)
>
> But I am not sure it is recommended to modify the "private" estimators_ and
> le_ attributes.
I think that this may work if you don't call the fit method of the
VotingClassifier after
Hi, Rares,
> I am looking at VotingClassifier but it seems that it is expected that the
> estimators are fitted when VotingClassifier.fit() is called. I don't see how
> I can have already fitted classifiers combined under a VotingClassifier.
I think the opposite is true: The classifiers provide
Hi, Paul,
I think there should be no issue with that as scikit-learn is distributed under
a BSD v3 license as long as you uphold the terms of that license. It's a bit
tricky to find that license note as it's not called "LICENSE" in the GitHub
repo like it is usually done for open source project
I'd agree with Gael that a potential explanation could be the distribution
shift upon splitting (usually the smaller the dataset, the more this is of an
issue). As potential solutions/workarounds, you could try
a) stratified sampling for regression, if you'd like to stick with the 2-way
holdout
Small batch sizes are typically used to speed up the training (more iterations)
and to avoid the issue that training sets usually don’t fit into memory. Okay,
the additional noise from the stochastic approach may also be helpful to escape
local minima and/or help with generalization performance
again for your advise.
>
> Li Yuan
>
> From: Sebastian Raschka
> Sent: Thursday, September 14, 2017 9:36 PM
> To: Scikit-learn mailing list
> Subject: Re: [scikit-learn] Help needed
>
> Hi, Li,
>
> to me, it looks like you are importing matplotlib in your c
Hi, Li,
to me, it looks like you are importing matplotlib in your code, but matplotlib
is not being installed on the CI instances that are running the scikit-learn
unit tests. Or in other words, the Travis instance is trying to execute an
"import matplotlib..." and fails because matplotlib is n
of new ANN architectures. I
> am in urgent need to reproduce in Keras the results obtained with
> MLPRegressor and the set of hyperparameters that I have optimized for my
> problem and later change the loss function.
>
>
>
> On 13 September 2017 at 18:14, Sebastian Raschka wr
gt; M the number of features?
>
> http://scikit-learn.org/stable/modules/svm.html#kernel-functions
>
>
>
> On 12 September 2017 at 00:37, Sebastian Raschka wrote:
> Hi Thomas,
>
> > For the MLPRegressor case so far my conclusion was that it is not possible
>
Hi Thomas,
> For the MLPRegressor case so far my conclusion was that it is not possible
> unless you modify the source code.
Also, I suspect that this would be non-trivial. I haven't looked to closely at
how the MLPClassifier/MLPRegressor are implemented but since you perform the
weight update
ote:
>
>
>
> On 10 September 2017 at 22:03, Sebastian Raschka wrote:
> You could normalize the outputs (e.g., via min-max scaling). However, I think
> the more intuitive way would be to clip the predictions. E.g., say you are
> predicting house prices, it probably makes no
You could normalize the outputs (e.g., via min-max scaling). However, I think
the more intuitive way would be to clip the predictions. E.g., say you are
predicting house prices, it probably makes no sense to have a negative
prediction, so you would clip the output at some value >0$
PS: -820 an
Another approach would be to pose this as a "ranking" problem to predict
relative affinities rather than absolute affinities. E.g., if you have data
from one (or more) molecules that has/have been tested under 2 or more
experimental conditions, you can rank the other molecules accordingly or
no
Hi, Hanna,
I think Joel is right and the renaming is probably causing the issues. Instead
of renaming the package to sklearn1, consider modifying, compiling, and
installing sklearn in a virtual environment. I am not sure if you are using
conda, in this case, creating a new virtual env for devel
Hi,
regarding MSE minimization vs variance reduction; it's been a few years but I
remember that we had a discussion about that, where Gilles Louppe explained
that those two are identical when I was confused about the wikipedia equation
at https://en.wikipedia.org/wiki/Decision_tree_learning#Va
Just read through the summary of the new features and browsed through the user
guide. The guide is really well structured and easy to navigate, thanks for
putting all the work into it. Overall, thanks for this great contribution and
new version :)
Best,
Sebastian
> On Aug 24, 2017, at 8:14 PM,
Yay, as an avid user, thanks to all the developers! This is a great release
indeed -- no breaking changes (at least for my code base) and so many
improvements and additions (that I need to check out in detail) :)
> On Aug 12, 2017, at 1:14 AM, Gael Varoquaux
> wrote:
>
> Hurray, thank you ev
rds,
> Georg
>
> Joel Nothman schrieb am So., 6. Aug. 2017 um 00:49
> Uhr:
> We are working on CategoricalEncoder in
> https://github.com/scikit-learn/scikit-learn/pull/9151 to help users more
> with this kind of thing. Feedback and testing is welcome.
>
> On 6
Hi, Georg,
I bring this up every time here on the mailing list :), and you probably aware
of this issue, but it makes a difference whether your categorical data is
nominal or ordinal. For instance if you have an ordinal variable like with
values like {small, medium, large} you probably want to
x27;t gotten traction.
> Overshadowed by GBM & random forests?
>
>
> On Fri, Jul 21, 2017 at 11:52 AM, Sebastian Raschka
> wrote:
>> Just to throw some additional ideas in here. Based on a conversation with a
>> colleague some time ago, I think learning c
ifference imho. I.e., treating ordinal variables like continuous
variable probably makes more sense than one-hot encoding them. Looking forward
to the PR :)
> On Jul 21, 2017, at 2:52 PM, Sebastian Raschka wrote:
>
> Just to throw some additional ideas in here. Based on a conversation w
Just to throw some additional ideas in here. Based on a conversation with a
colleague some time ago, I think learning classifier systems
(https://en.wikipedia.org/wiki/Learning_classifier_system) are particularly
useful when working with large, sparse binary vectors (like from a one-hot
encodin
>> Does scikit have a function to find the maximum f1 score (and decision
>> threshold) for a (soft) classifier?
Hm, I don't think so. F1-score is typically used as evaluation metric; hence,
it's something optimized via hyperparameter tuning. There's an interesting
publication though, where the
I think there can be some middle ground. I.e., adding a new, simple dataset to
demonstrate regression (maybe autmpg, wine quality, or sth like that) and use
that for the scikit-learn examples in the main documentation etc but leave the
boston dataset in the code base for now. Whether it's a weak
Hi,
hm, I think that dropping a column in onehot encoded features is quite uncommon
in machine learning practice -- based on the applications and implementations
I've seen. My guess is that the onehot encoded features are multicolinear
anyway!? There may be certain algorithms that benefit from
r me, I have some sense of machine learning, but none of Python.
>
> Unlike R, which is specifically for statistics analysis. Python is broad!
>
> Maybe some expert here with R can tell me how to go about this. :)
>
> On Sun, Jun 18, 2017 at 12:53 PM, Sebastian Raschka
Hi,
> I am extremely frustrated using this thing. Everything comes after a dot! Why
> would you type the sam thing at the beginning of every line. It's not
> efficient.
>
> code 1:
> y_sin = np.sin(x)
> y_cos = np.cos(x)
>
> I know you can import the entire package without the "as np", but I s
1 - 100 of 209 matches
Mail list logo