Re: [Scikit-learn-general] Canonical Correlation Forests

2015-12-10 Thread Artem
Hi Scott The paper is quite new, and sklearn has a policy about introducing new algorithms. I'd say we need more time for others to test it and prove its usefulness. On Thu, Dec 10, 2015 a

Re: [Scikit-learn-general] Contribution to Scikit

2015-10-23 Thread Artem
Hi Rajlaxmi There are *many issue*s labeled easy with no assignee. On Fri, Oct 23, 2015 at 2:43 PM, Rajlaxmi Sahu wrote: > Hi, > > I would like to contribute to Scikit-learn. I was browsing t

Re: [Scikit-learn-general] passing optional parameters to fit() when using a pipeline

2015-09-20 Thread Artem
alidation_set > = [???]*) > estimator = > ​​ > GridSearchCV(pipe, my_params, verbose=5, cv=5) > estimator.fit(x_train, y_train) > > > ? > > On Sun, Sep 20, 2015 at 10:10 AM, Artem wrote: > >> Hi >> >> Don't pass any parameters to fit meth

Re: [Scikit-learn-general] passing optional parameters to fit() when using a pipeline

2015-09-20 Thread Artem
Hi Don't pass any parameters to fit method. Current API assumes that you set all the parameters in estimator's constructor (__init__ method). It's a bit nasty to set validation set during construction stage, but there's no better approach. On Sun, Sep 20, 2015 at 3:47 PM, okek padokek wrote: >

Re: [Scikit-learn-general] About C50

2015-08-22 Thread Artem
Do you mean C5.0 which is further development of C4.5 tree algorithm? If so, then the answer is no, it's not implemented in sklearn. Furthermore, according to wikipedia , C5.0 is a commercial product and (AFAIK) al

Re: [Scikit-learn-general] AUC realy low

2015-08-05 Thread Artem
er! >> >> hmm its possible, I just make a little example: >> >> auc is [0.952710670069, 0.01890450385597026, 0.0059624156214325846, >> 0.05391726570661811] >> expected is [0.0, 1.0, 1.0, 1.0] >> but this is already with changed values, in the test set

Re: [Scikit-learn-general] AUC realy low

2015-08-04 Thread Artem
Hi Herbert The worst value for AUC is 0.5 actually. Having values close to 0 means than you can get a value as close to 1 by just changing your predictions (predict class 1 when you think it's 0 and vice versa). Are you sure you didn't confuse classes somewhere along the lines? (You might have cho

Re: [Scikit-learn-general] Speed up transformation step with multiple 1 vs rest binary text classifiers.

2015-07-02 Thread Artem
Hi Nikhil Do you somehow do topic-specific TF-IDF transformations? Could you provide a small (pseudo) code snippet for what you're doing? I may be wrong, but judging from what you wrote, it doesn't look like you use scikit-learn's OneVsRestClassifier

Re: [Scikit-learn-general] RandomForestClassifier with warm_start and n_jobs

2015-06-24 Thread Artem
Hi Dale Thanks for the code sample! Indeed, warm_start does not disable parallelization, I can confirm by both running your code and checking the source. Moreover, that example you mentioned was added on May, 2nd, and it doesn't look

Re: [Scikit-learn-general] [GSoC2015 metric learning]

2015-05-31 Thread Artem
mization in semivectorized version, but its speedup isn't that significant. On Sun, May 31, 2015 at 9:29 PM, Michael Eickenberg < michael.eickenb...@gmail.com> wrote: > > > On Sun, May 31, 2015 at 7:25 PM, Artem wrote: > >> I added a simple benchmark >> <https://github.c

Re: [Scikit-learn-general] [GSoC2015 metric learning]

2015-05-31 Thread Artem
earn repo, > just to keep stuff in one place for easy future reference. > > Michael > > > On Fri, May 29, 2015 at 6:24 PM, Artem wrote: > >> So, I created a WIP PR dedicated to NCA: >> https://github.com/scikit-learn/scikit-learn/pull/4789 >> >> As sugge

Re: [Scikit-learn-general] [GSoC2015 metric learning]

2015-05-29 Thread Artem
So, I created a WIP PR dedicated to NCA: https://github.com/scikit-learn/scikit-learn/pull/4789 As suggested by Michael, I refactored "the meat" into a function. I also rewrote it as a first order oracle, so I can (and I do) use scipy's optimizers. I've seen scipy.optimize.minimize (apparently, wi

Re: [Scikit-learn-general] [GSoC2015 metric learning]

2015-05-28 Thread Artem
fixing all cells of matrix but one)? - Or maybe it's enough to use scipy's conjugate gradients optimizer? On Mon, May 4, 2015 at 2:02 PM, Michael Eickenberg < michael.eickenb...@gmail.com> wrote: > Dear Artem, > > congratulations on the acceptance of your GSoC prop

Re: [Scikit-learn-general] GSoC Community Bonding

2015-05-27 Thread Artem
Hi Gael ​ My GSoC blog url is http://barmaley-exe.blogspot.com As required, there's relevant tag gsoc15 On Mon, May 25, 2015 at 3:08 PM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > Hi GSOC students, > > And welcome. I hope that you will have a fun and productive summer. > > To commun

Re: [Scikit-learn-general] use features from a sklearn branch

2015-05-08 Thread Artem
Looks like you have a circular import, and Python doesn't like them. Sorry, I don't have a quick hack solution to this, all I can propose is to look at import chain, understand, which import breaks it all, and get rid of it. For example, you can move some imports into functions, so they're not call

[Scikit-learn-general] [GSoC] Project Metric Learning

2015-05-02 Thread Artem
Hello Andreas Hello Michael First, I'm happy to be selected as this year's scikit-learn student, and hope to make a great work. According to my timeline , I'm going to use community

Re: [Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Artem
​GridSearchCV is not a​n estimator, but an "utility" to find one. So you should `fit` grid search first in order to find that classifier that performs well on cv-splits, and then use it. Like this gbr = GradientBoostingClassifier() parameters = {'learning_rate': [0.1, 0.01, 0.001],

Re: [Scikit-learn-general] Degree parameter in Nu-Support Vector Classification

2015-04-22 Thread Artem
Looks like a typo, indeed. Libsvm only uses `degree` for polynomial kernels. On Wed, Apr 22, 2015 at 11:39 PM, Sebastian Raschka wrote: > Hi all, > > I am wondering a little bit about this documentation of the degree > parameter on NuSVM and SVR: > > degree : int, optional (default=3) > Degree o

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-26 Thread Artem
is: > > class SimilarityTransformer(TransformerMixin): > def fit(self, X, y): > self.X_ = X; return self > >def transform(self, X): >return -euclidean_distances(X, self.X_) > > On Thu, Mar 26, 2015 at 6:28 PM, Artem wrote: > >> Yes, the only need

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-26 Thread Artem
, at the moment only `AgglomerativeClustering` works well with a custom metric, and Spectral Clustering and Affinity Propagation can work with a [n_samples, n_samples] affinity matrix. On Thu, Mar 26, 2015 at 12:08 PM, Mathieu Blondel wrote: > > > On Thu, Mar 26, 2015 at 5:49 PM, Artem wrote: >

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-26 Thread Artem
nd needs > affinity="precomputed" (otherwise, it assumes that X is [n_samples, > n_features]) > - Instead of duplicating each class, you could create a generic > transformer that outputs a similarity / distance matrix from X. > > M. > > On Thu, Mar 26, 2015 at 4:50 PM, Arte

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-26 Thread Artem
le. > > Please give a code example in your proposal to see how this would look > like. > > M. > > On Thu, Mar 26, 2015 at 5:18 AM, Artem wrote: > >> ​Ok, so I removed matrix y from the proposal >> <https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Artem
gt;> submit an early version. >> >> >> >> On 03/25/2015 04:18 PM, Artem wrote: >> >> ​Ok, so I removed matrix y from the proposal >> <https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Metric-Learning-module>. >> Therefore I al

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Artem
epend on how many slots we get and how we want to prioritize them. > > M. > > On Wed, Mar 25, 2015 at 10:04 AM, Vlad Niculae wrote: > >> Hi Artem, hi everybody, >> >> There were two API issues and I think both need thought. The first is the >> matrix-lik

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
submitting the last one (ITML). By the end of the 10th week I might still not have the second review completed, but it's okay, there're 2+ more weeks to get it done. On Wed, Mar 25, 2015 at 4:04 AM, Vlad Niculae wrote: > Hi Artem, hi everybody, > > There were two API issues a

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
You mean matrix-like y? Gael said > > FWIW It'll require some changes to cross-validation routines.​ > I'd rather we try not to add new needs and usecases to these before we > ​ ​ > release 1.0. We are already having a hard time covering in a homogeneous > ​ ​ > way all the possible options.​

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
> > ​ > In other words, I would like to get in an "API freeze" state where we add/modify only essentials stuff to the API. ​Ok, then I suppose, the easiest way would be to create 2 kind of transformers for each method: one that transforms the space so that Euclidean distance acts like Mahalanobis'

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
ml's data​ One would also be able to pipeline it with KNN: ml = MetricLearner() knn = KNN() pipeline = Pipeline([ ('ml', ml), ('knn', knn) ]) pipeline.fit(X_train, y_train)​ pipeline.predict(X_test) # ml.transform returns transformed data On Tue, Mar 24, 2015 at 1:43 AM

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
while waiting for a review feedback. I'll elaborate my proposal later today. On Tue, Mar 24, 2015 at 2:34 PM, Joel Nothman wrote: > Hi Artem, I've taken a look at your proposal. I think this is an > interesting contribution, but I suspect your proposal is far too ambitious: > &g

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-23 Thread Artem
It's worth noting that there was a similar project 2 years ago, but unfortunately it wasn't completed. I made some work upon that, but I didn't get any feedback. On Tue, Mar 24, 2015 at 3:23 AM, Vlad Niculae wrote: > Hi Vinayak, > > The wi

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-23 Thread Artem
ttp://www-bcf.usc.edu/~feisha/pubs/chi2.pdf) or some approaches based > on deep neural nets. > > Aurélien > > Le 3/23/15 11:43 PM, Andreas Mueller a écrit : > > Hi Artem. > > I thought that was you, but I wasn't sure. > > Great, I linked to your draft from the

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-23 Thread Artem
dreas Mueller wrote: > Hi Artem. > I thought that was you, but I wasn't sure. > Great, I linked to your draft from the wiki overview page, otherwise it is > hard to find. > I haven't looked at it in detail yet, though. > > 1.1: no, generalizing K-Means is out of sc

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-23 Thread Artem
ty matrix. What do you think? On Tue, Mar 24, 2015 at 1:09 AM, Andreas Mueller wrote: > Hi Artem. > I think the overall feedback on your proposal was positive. > Did you get the chance to write it up yet? > Please submit your proposal on melange https://www.google-melange.co

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-23 Thread Artem
Either way, apparently, there's no justification to use kernel approximation with ITML, since even the regular KPCA trick doesn't apply to it. On Mon, Mar 23, 2015 at 5:07 PM, Andreas Mueller wrote: > > > On 03/21/2015 08:54 PM, Artem wrote: > > Are there any objections

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-22 Thread Artem
, 2015 at 6:42 AM, Mathieu Blondel wrote: > I skimmed through this survey: > http://arxiv.org/abs/1306.6709 > > For methods that learn a Mahalanobis distance, as Artem said, we can > indeed compute the Cholesky decomposition of the learned precision matrix > and use it to transform

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-21 Thread Artem
y a pull request on neighborhood component analysis > https://github.com/scikit-learn/scikit-learn/issues/3213 > > A first step of the GSoC could be to complete it. > > Gaël > > > On Wed, Mar 18, 2015 at 11:39 PM, Artem wrote: > > > Hello everyone > > >

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-19 Thread Artem
gt; > On 19 March 2015 at 08:47, Andreas Mueller wrote: > >> In summary, I think this does look like a good basis for a proposal :) >> >> >> >> On 03/18/2015 05:14 PM, Artem wrote: >> >> ​ >>> Do you think this interface would be useful eno

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
o combine these methods, but overfitting could be a problem, indeed. Not sure how severe it can be. On Wed, Mar 18, 2015 at 10:07 PM, Andreas Mueller wrote: > > On 03/18/2015 02:53 PM, Artem wrote: > > I mean that if we were solving classification, we would have y that > tells us w

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
it > limiting, but I don't see a different way to do > it within the current API. > Can you explain this statement a bit more " We can go with usual y vector > consisting of feature labels" ? > > Thanks, > Andy > > > On 03/18/2015 12:55 PM, Artem wrote:

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
Varoquaux < gael.varoqu...@normalesup.org> wrote: > On Wed, Mar 18, 2015 at 07:21:18PM +0300, Artem wrote: > > As to what y should look like, it depends on what we'd like the > algorithm to > > do. We can go with usual y vector consisting of feature labels. > Actually, L

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
orm" to get the distances to the training data > (is that what one would want?) > But how would the labels for the "fit" look like? > > Cheers, > Andy > > > On 03/18/2015 08:39 AM, Artem wrote: > > Hello everyone > > Recently I mentioned metric learning

[Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
Hello everyone Recently I mentioned metric learning as one of possible projects for this years' GSoC, and would like to hear your comments. Metric learning, as follows from the name, is about learning distance functions. Usually the metric that is learned is a Mahalanobis metric, thus the problem

Re: [Scikit-learn-general] SVM: Matlab vs skleanr

2015-03-03 Thread Artem
Can't say about Matlab, but sklearn does SVM (unless it's LinearSVC) using libsvm internally (with minor tweaks on top), so you should expect the same results. On Tue, Mar 3, 2015 at 11:53 PM, Pagliari, Roberto wrote: > Has anybody ever compared Matlab SVM vs sklearn or libsvm? > > > > It’d be i

Re: [Scikit-learn-general] GSoC2015 topics

2015-03-03 Thread Artem
There was a discussion on metric learning a while ago, and several people expressed interest to see (and contribute to) it in sklearn. But, it looks like that attempt didn't get anywhere. What about a project to

Re: [Scikit-learn-general] random forests with njobs>1

2015-02-27 Thread Artem
Do you have joblib installed? Does n_jobs > 1 work with other algorithms? On Sat, Feb 28, 2015 at 12:55 AM, Pagliari, Roberto wrote: > When using random forests with njobs > 1, I see one python process only. > Does RF support using multiprocessor module? > > > > > ---

Re: [Scikit-learn-general] Desicion_function SVM returns one class score only

2015-02-26 Thread Artem
Hi Shalu decision_function returns (signed) distance to each of separating hyperplanes. There's one hyperplane for each pair of classes, so in case of 2 classes there'd be one hyperplane. Iris dataset contains 3 classes, so there are 3 possible pairs, and thus 3 columns in the result of decision_f

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-25 Thread Artem
> > ​ > Online low-rank matrix completion : this is from last year and I'm not sure if it is still desirable / don't know the state of the PR > ​ You mean ​ ​this one ​? I picked it up

[Scikit-learn-general] Circular import

2015-02-19 Thread Artem
Hello While working on matrix factorization with missing values I faced a circular import issue ( https://travis-ci.org/scikit-learn/scikit-learn/jobs/50276638#L1362). The problem is that I want to add a new imputer, which should reside in

Re: [Scikit-learn-general] Regarding classification with one variable only

2015-02-16 Thread Artem
X needs to be a matrix of shape (n_samples, n_features), not a vector. You need to reshape it into the matrix by doing X_train = X_train.reshape( (len(X_train), 1) ) On Mon, Feb 16, 2015 at 4:01 PM, shalu jhanwar wrote: > Hi Scikit fans, > > I am facing following error while performing classifi

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-12 Thread Artem
Andy wrote: > > On 02/12/2015 04:47 AM, Artem wrote: > > There are several packages (spearmint, hyperopt, MOE) offering Bayesian > Optimization to the problem of choosing hyperparameters. Wouldn't it be > nice to add such *Search[CV] to sklearn? > > Yes. I haven'

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-12 Thread Artem
There are several packages (spearmint, hyperopt, MOE) offering Bayesian Optimization to the problem of choosing hyperparameters. Wouldn't it be nice to add such *Search[CV] to sklearn? On Thu, Feb 12, 2015 at 12:33 PM, Mathieu Blondel wrote: > A grid-search related project could be useful: > > -

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-11 Thread Artem
There was an interview with Ilya Sutskever about deep learning ( http://yyue.blogspot.ru/2015/01/a-brief-overview-of-deep-learning.html), where he states that DL's success can be attributed to 3 main breakthroughs: 1. Computing resources. 2. Large datasets. 3. Tricks of the trade, discovered in re

Re: [Scikit-learn-general] regression with one independent variable

2015-02-11 Thread Artem
ctor, new shape and order. What does the line below do? > > > > Thank yuou, > > > > > > *From:* Artem [mailto:barmaley@gmail.com] > *Sent:* Wednesday, February 11, 2015 1:39 PM > *To:* scikit-learn-general@lists.sourceforge.net > *Subject:* Re: [Scikit-

Re: [Scikit-learn-general] regression with one independent variable

2015-02-11 Thread Artem
​fit expects 2-dimensional input, whereas X[:, 0] is one dimensional. You can either reshape it manually: ​regr.fit(x_train[:, 0].reshape((x_train.shape[0], 1)), x_train[:, 1]) or use slices to select continuous range of columns: ​regr.fit(x_train[:, :1], x_train[:, 1]) What exception tells you