Re: [scikit-learn] Tikhonov regularization

2020-08-11 Thread Michael Eickenberg
Hi David, I am assuming you mean that T acts on w. If T is invertible, you can absorb it into the design matrix by making a change of variable v=Tw, w=T^-1 v, and use standard ridge regression for v. If it is not (e.g. when T is a standard finite difference derivative operator) then this trick

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread Michael Eickenberg
Hi, I think there are many reasons that have led to the current situation. One is that scikit-learn is based on numpy arrays, which do not offer categorical data types (yet: ideas are being discussed https://numpy.org/neps/nep-0041-improved-dtype-support.html Pandas already has a categorical data

Re: [scikit-learn] PolynomialFeatures

2019-11-23 Thread Michael Eickenberg
I think it might generate a basis that is capable of generating what you describe above, but feature expansion concretely reads as 1, a, b, c, a ** 2, ab, ac, b ** 2, bc, c ** 2, a ** 3, a ** 2 * b, a ** 2 * c, a* b ** 2, abc, a*c**2, b**3, b**2 * c, b*c**2, c**3 Hope this helps On Fri, Nov 22,

Re: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1

2019-05-29 Thread Michael Eickenberg
Hi Jesse, I think there was an effort to compare normalization methods on the data attachment term between Lasso and Ridge regression back in 2012/13, but this might have not been finished or extended to Logistic Regression. If it is not documented well, it could definitely benefit from a

Re: [scikit-learn] RidgeCV with multiple targets returns a single alpha. Is it possible to get one alpha per target?

2018-08-07 Thread Michael Eickenberg
You can get one alpha per target in the Ridge estimator (without CV). Then you would have to code the cv loop yourself. Depending on how many target you have this can be more efficient than looping over targets as Alex suggests. Either way there is some coding to do unfortunately. Michael

Re: [scikit-learn] Why doesn't sklearn have support for a Batch Gradient Descent Regressor

2018-05-29 Thread Michael Eickenberg
Hi Lekan, for which type of estimator are you looking for a batch gradient descent regressor? Michael On Tue, May 29, 2018 at 4:54 PM, Lekan Wahab wrote: > I have a feeling this question might have been asked before or there's > some sort of resource somewhere on it but so far I haven't found

Re: [scikit-learn] Should we standardize data before PCA?

2018-05-24 Thread Michael Eickenberg
Hi, that totally depends on the nature of your data and whether the standard deviation of individual feature axes/columns of your data carry some form of importance measure. Note that PCA will bias its loadings towards columns with large standard deviations all else being held equal (meaning that

Re: [scikit-learn] Jeff Levesque: neuroscience related datasets

2018-05-05 Thread Michael Eickenberg
Hi Jeffrey, check out these here for neuron data and fmri: http://crcns.org/ And the ones here for fmri: https://openfmri.org/ You can get started by installing one of the following packages and using their dataset downloaders

Re: [scikit-learn] How does multiple target Ridge Regression work in scikit learn?

2018-05-02 Thread Michael Eickenberg
By the linear nature of the problem the targets are always separately treated (even if there was a matrix-variate normal prior indicating covariance between target columns, you could do that adjustment before or after fitting). As for different alpha parameters, I think you can specify a different

Re: [scikit-learn] 1. Re: unclear help file for sklearn.decomposition.pca

2017-10-16 Thread Michael Eickenberg
Your document says: > This data has already been pre-processed so that each of the features and have about the same mean (zero) and variance. This means that you do this before doing the eigendecomposition. Check the wikipedia article https://en.wikipedia.org/wiki/Principal_component_analysis

Re: [scikit-learn] Are sample weights normalized?

2017-07-28 Thread Michael Eickenberg
100 > of sample 3, sample 3 will be given a lot of focus during training because > it exists in majority, but if my dataset size was say 1 million, these > weights wouldn't really affect much? > > Thanks, > Abhishek > > On Jul 28, 2017 10:41 PM, "Michael Eickenberg&qu

Re: [scikit-learn] Are sample weights normalized?

2017-07-28 Thread Michael Eickenberg
Hi Abhishek, think of your example as being equivalent to putting 1 of sample 1, 10 of sample 2 and 100 of sample 3 in a dataset and then run your SVM. This is exactly true for some estimators and approximately true for others, but always a good intuition. Hope this helps! Michael On Fri, Jul

Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn

2017-02-03 Thread Michael Eickenberg
Dear Afarin, scikit-learn is designed for predictive modelling, where evaluation is done out of sample (using train and test sets). You seem to be looking for a package with which you can do classical in-sample statistics and their corresponding evaluations among which p-values. You are probably

Re: [scikit-learn] Specify boosting percentage using Randomoversampling?

2017-01-10 Thread Michael Eickenberg
Is maybe this contrib what you are looking for? Take a close look to see whether it does what you expect. http://contrib.scikit-learn.org/imbalanced-learn/auto_examples/over-sampling/plot_smote.html On Tue, Jan 10, 2017 at 6:36 PM, Suranga Kasthurirathne < suranga...@gmail.com> wrote: > > Hi

Re: [scikit-learn] NuSVC and ValueError: specified nu is infeasible

2016-12-08 Thread Michael Eickenberg
You have to set a bigger \nu. Try nus =2 ** np.arange(-1, 10) # starting at .5 (default), going to 512 for nu in nus: clf = svm.NuSVC(nu=nu) try: clf.fit ... except ValueError as e: print("nu {} not feasible".format(nu)) At some point it should start working. Hope

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Michael Eickenberg
Here is a possibly useful comment of larsmans on stackoverflow about exactly this procedure http://stackoverflow.com/questions/26604175/how-to-predict-a-continuous-dependent-variable-that-expresses-target-class-proba/26614131#comment41846816_26614131 On Mon, Oct 10, 2016 at 4:04 PM, Sean

Re: [scikit-learn] Install sklearn into a specific folder to make some changes

2016-08-01 Thread Michael Eickenberg
There are several ways of achieving this. One is to build scikit-learn in place by going into the sklearn clone and typing make in or alternatively python setup.py build_ext --inplace # (i think) Then you can use the environment variable PYTHONPATH, set to the github clone, and python will

Re: [scikit-learn] Install sklearn into a specific folder to make some changes

2016-08-01 Thread Michael Eickenberg
On Monday, August 1, 2016, Andreas Mueller wrote: > Hi. > The best is probably to use a virtual environment or conda environment > specific for this changed version of scikit-learn. > In that environment you could just run an "install" and it would not mess > with your other

Re: [scikit-learn] Using fit_intercept with sparse matrices

2016-07-05 Thread Michael Eickenberg
On Tuesday, July 5, 2016, Joel Nothman wrote: > Jaidev is suggesting that fit_intercept=False makes no sense if the data > is sparse. > +1 > But I think that depends on your target variable. > +1 > > > > On 4 July 2016 at 22:11, Alexandre Gramfort < >