Re: [scikit-learn] Can we say stochastic gradient descent as an ML model?

2019-10-28 Thread federico vaggi
In this case, SGD just means a linear model that is fit using stochastic gradient descent instead of batch gradient methods. If you want to have more control about the combination of model / loss function / optimization algorithm, http://contrib.scikit-learn.org/lightning/ is better oriented for t

Re: [scikit-learn] question

2019-10-19 Thread federico vaggi
Your options are to either pick a clustering algorithm that supports a pre-computed distance matrix, or, find some kind of projection from C -> R, embed your data in R, then cluster your embedded data and transfer the labels back to C. On Sat, Oct 19, 2019 at 11:44 AM ahmad qassemi wrote: > Dear

Re: [scikit-learn] fit before partial_fit ?

2019-06-06 Thread federico vaggi
k-means isn't a convex problem, unless you freeze the initialization, you are going to get very different solutions (depending on the dataset) with different initializations. On Thu, Jun 6, 2019 at 12:05 AM lampahome wrote: > I tried MiniBatchKMeans with two order: > fit -> partial_fit > partial

[scikit-learn] Categorical Encoding of high cardinality variables

2019-04-19 Thread federico vaggi
Hi everyone, I wanted to use the scikit-learn transformer API to clean up some messy data as input to a neural network. One of the steps involves converting categorical variables (of very high cardinality) into integers for use in an embedding layer. Unfortunately, I cannot quite use LabelEncode

Re: [scikit-learn] Applying clustering to cosine distance matrix

2018-02-12 Thread federico vaggi
As a caveat, a lot of clustering algorithms assume that the distance matrix is a proper metric. If your distance is not a proper metric then the results might be meaningless (the narrative docs do a good job of discussing this). On Mon, 12 Feb 2018 at 13:30 prince gosavi wrote: > Hi, > Thanks f

Re: [scikit-learn] custom loss function

2017-09-13 Thread federico vaggi
n loss function. Never mentioned about classification and > hinge loss. > > On 13 September 2017 at 23:51, federico vaggi > wrote: > >> You are confusing the kernel with the loss function. SVM minimize a well >> defined hinge loss on a space that's implicitly defined

Re: [scikit-learn] custom loss function

2017-09-13 Thread federico vaggi
You are confusing the kernel with the loss function. SVM minimize a well defined hinge loss on a space that's implicitly defined by a kernel mapping (or, in feature space if you use a linear kernel). On Wed, 13 Sep 2017 at 14:31 Thomas Evangelidis wrote: > What about the SVM? I use an SVR at th

Re: [scikit-learn] Contribution

2017-07-10 Thread federico vaggi
Hey Gurhan, sklearn doesn't really neatly separate optimizers from the models they optimize at the level of API (except in a few cases). In order to make the package more friendly to newer user, each model has excellent optimizer defaults that you can use, and only in a few cases does it make sen

Re: [scikit-learn] How to dump a model to txt file?

2017-04-13 Thread federico vaggi
If you want to use the model from C++ code, the easiest way is to probably use Boost/Python ( http://www.boost.org/doc/libs/1_62_0/libs/python/doc/html/index.html). Alternatively, use another gradient boosting library that has a C++ API (like XGBoost). Keep in mind, if you want to call Python code

Re: [scikit-learn] decision trees

2017-03-29 Thread federico vaggi
That's a really good point. Do you know of any systematic studies about the two different encodings? Finally: wasn't there a PR for RF to accept categorical variables as inputs? On Wed, 29 Mar 2017 at 11:57, Olivier Grisel wrote: > Integer coding will indeed make the DT assume an arbitrary ord

Re: [scikit-learn] Intermediate results using gridsearchCV?

2017-03-19 Thread federico vaggi
I imagine he is suggesting to have an iterator that yields results while it's running, instead of only getting the result at the end of the run. On Sun, 19 Mar 2017 at 11:46 Joel Nothman wrote: > Not sure what you mean. Have you used cv_results_ > > On 18 March 2017 at 08:46, Carlton Banks wrot

Re: [scikit-learn] Complex variables in Gaussian mixture models?

2017-01-09 Thread federico vaggi
Probably not the most principled way to handle it, but: can't you treat 1 dimensional complex numbers as 2 dimensional real numbers, and then try to cluster those with the GMM? On Mon, 9 Jan 2017 at 20:34 Rory Smith wrote: > Hi All, > > I’d like to set up a GMM using mixture.BayesianGaussianMixt

Re: [scikit-learn] KNeighborsClassifier and metric='precomputed'

2017-01-03 Thread federico vaggi
That would be most helpful. Maybe also explain the logic? On Tue, 3 Jan 2017 at 18:19 Andy wrote: > Should probably be called n_samples_train? > > > On 01/02/2017 04:10 PM, Joel Nothman wrote: > > n_indexed means the number of samples in the X passed to fit. It needs to > be able to compare eac

Re: [scikit-learn] Fwd: Scikit-learn MLPRegressor Help

2016-12-03 Thread federico vaggi
As long as the feature ordering has a meaningful spatial component (as is almost always the case when you are dealing with raw pixels as features) CNNs will almost always be better. CNNs actually have a lot fewer parameters than MLPs (depending on architecture of course) because of weight sharing

Re: [scikit-learn] GPR intervals and MCMC

2016-11-08 Thread federico vaggi
Hi, if you want to have the full posterior distribution over the values of the hyper parameters, there is a good example on how to do that with George + emcee, another GP package for Python. http://dan.iel.fm/george/current/user/hyper/ On Tue, 8 Nov 2016 at 16:10 Quaglino Alessio wrote: > Hell

Re: [scikit-learn] Scaling model selection on a cluster

2016-08-07 Thread federico vaggi
This might be interesting to you: http://blaze.pydata.org/blog/2015/10/19/dask-learn/ On Sun, 7 Aug 2016 at 10:42 Vlad Ionescu wrote: > Hello, > > I am interested in scaling grid searches on an HPC LSF cluster with about > 60 nodes, each with 20 cores. I thought i could just set n_jobs=1000 th

Re: [scikit-learn] Is there any official position on PEP484/mypy?

2016-08-04 Thread federico vaggi
Another point about the dependency: the dependency is not required for run time - it is only required to run the type checker. You could easily put it in a try/catch block and people running scikit-learn wouldn't need it. On Thu, 4 Aug 2016 at 13:41 Daniel Moisset wrote: > If the dependency is

Re: [scikit-learn] Is there any official position on PEP484/mypy?

2016-07-29 Thread federico vaggi
I've been using mypy on a much smaller codebase I've been developing. The main benefits are: 1- Much nicer IDE experience when using something like pycharm. I expect more text editors to start supporting this in the future. 2- An additional way to catch some compile time errors early on. For a

Re: [scikit-learn] Declaring numpy and scipy dependencies?

2016-07-28 Thread federico vaggi
My main issue with the upgrade is that if there was a slightly newer version of numpy/scipy it would try to upgrade my numpy/scipy linked against MKL/blas to a vanilla version downloaded from the cheese shop. It was a huge pain. On Thu, 28 Jul 2016 at 21:17 Matthew Brett wrote: > On Thu, Jul 28

[scikit-learn] EuroSciPy 2016 Call for Papers Extended

2016-06-08 Thread federico vaggi
Hi everyone, The call for contributions (talks, posters, sprints) is still open until June 24th. EuroSciPy 2016 takes place in Erlangen, Germany, from the 23 to the 27 of August and consists of two days of tutorials (beginner and advanced tracks) and two days of conference representing many field

Re: [scikit-learn] Fitting Lognormal Distribution

2016-05-26 Thread federico vaggi
Err, sorry - mu1, mu2, sigma1, sigma2, where mu1, sigma1 are the mean/standard deviation of the first distribution, and mu2, sigma2 are the mean and standard deviation of the second distribution. On Thu, 26 May 2016 at 09:26 federico vaggi wrote: > If you are talking about finding the values

Re: [scikit-learn] Fitting Lognormal Distribution

2016-05-26 Thread federico vaggi
- Thanks. will do that > > (2) - I am fitting the distribution for 2 different set of values.. I will > find the distribution as mentioned by you in (1).. But, now having 2 > curves, how do i find the meetings point(s) ? > > Regards, > Sanant > > On Thu, May 26, 2016 at 12:16

Re: [scikit-learn] Fitting Lognormal Distribution

2016-05-25 Thread federico vaggi
1) The normal distribution is parametrized by standard deviation and mean. Simply take the mean and standard deviation of the log of your values? 2) Which curves? You only mentioned a single log normal distribution. On Thu, 26 May 2016 at 08:42 Startup Hire wrote: > Hi Michael, > > :) > > > (1