Re: [Scikit-learn-general] how does sklearn apply pipelines

2015-02-26 Thread Sebastian Raschka
Roberto, both fit_transform and fit/transform should work: for name, transform in self.steps[:-1]: if hasattr(transform, "fit_transform"): Xt = transform.fit_transform(Xt, y, **fit_params_steps[name]) else: Xt = transform.fit(Xt, y, **fit_par

Re: [Scikit-learn-general] how does sklearn apply pipelines

2015-02-26 Thread Pagliari, Roberto
I guess either fit_transform or just transform, because fit_transform is not mandatory isn’t it ? For instance, I did not implement fit_transform, and did not get complains From: Joel Nothman [mailto:joel.noth...@gmail.com] Sent: Thursday, February 26, 2015 9:34 PM To: scikit-learn-general Subje

Re: [Scikit-learn-general] how does sklearn apply pipelines

2015-02-26 Thread Joel Nothman
And when some function f (such as predict) other than fit is called on the pipeline, it invokes transform on all the steps but the last, and on the last step calls f with the transformed data. On 27 February 2015 at 13:31, Sebastian Raschka wrote: > It's actually quite simple: It invokes fit_tra

Re: [Scikit-learn-general] how does sklearn apply pipelines

2015-02-26 Thread Sebastian Raschka
It's actually quite simple: It invokes fit_transform on all elements in a pipeline but the last. On the last element in the pipeline (the estimator) only fit is invoked. Best, Sebastian > On Feb 26, 2015, at 9:01 PM, Pagliari, Roberto > wrote: > > Given a pipeline with a certain number of tr

[Scikit-learn-general] how does sklearn apply pipelines

2015-02-26 Thread Pagliari, Roberto
Given a pipeline with a certain number of transformers and a classifier, how does sklearn know which method should be invoked from one step to another? Does it list the available methods for each object? -- Dive into the

Re: [Scikit-learn-general] Score function in Extra-Trees

2015-02-26 Thread Andy
PR welcome! On 02/26/2015 01:19 PM, Pierre-Luc Bacon wrote: Thanks ! Perhaps the documentation could be updated to make this clear ? Pierre-Luc On Tue, Feb 24, 2015 at 5:24 AM, Arnaud Joly > wrote: Hi Pierre-Luc, This is the same criterion, but with a differ

Re: [Scikit-learn-general] Query: Random forest

2015-02-26 Thread Sebastian Raschka
i) I think in practice, this scenario is highly unlikely (floating points), but I am pretty sure it would be the class with the lower integer index (due to argmax). ii) general question: is one class over- or underrepresented? I assume you already did some grid searching and it's the best you co

Re: [Scikit-learn-general] Score function in Extra-Trees

2015-02-26 Thread Pierre-Luc Bacon
Thanks ! Perhaps the documentation could be updated to make this clear ? Pierre-Luc On Tue, Feb 24, 2015 at 5:24 AM, Arnaud Joly wrote: > Hi Pierre-Luc, > > This is the same criterion, but with a different name. > The maximisation of the reduction of variance at each split > will lead to minimi

Re: [Scikit-learn-general] Desicion_function SVM returns one class score only

2015-02-26 Thread shalu jhanwar
Hey Guys! thanks a lot for explaining me the details. Could you please explain following: i) So basically more would be the distance, it would be deeper in the hyperplane, more confident would be the prediction? If the predicted label is 1, it means the decision function will provide the distance

Re: [Scikit-learn-general] Query: Random forest

2015-02-26 Thread shalu jhanwar
thanks Sebastian for the reply. I am having a training dataset having negative and positive class in 1:1 ratio. Yes I already perform grid searching for the best parameter (for SVM) and selected the best value for n_estimator based on accuracy (for Random forest). Even I tried with all the combina

Re: [Scikit-learn-general] Desicion_function SVM returns one class score only

2015-02-26 Thread Artem
Hi Shalu decision_function returns (signed) distance to each of separating hyperplanes. There's one hyperplane for each pair of classes, so in case of 2 classes there'd be one hyperplane. Iris dataset contains 3 classes, so there are 3 possible pairs, and thus 3 columns in the result of decision_f

Re: [Scikit-learn-general] Desicion_function SVM returns one class score only

2015-02-26 Thread abhishek
In SVM, a sample is predicted using sign of the decision function. The decision function will always have only one real number. SVC uses platt scaling to determine probability values ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1639) On Thu, Feb 26, 2015 at 5:29 PM shalu jhanwar w

[Scikit-learn-general] Desicion_function SVM returns one class score only

2015-02-26 Thread shalu jhanwar
Hi guys, I'm facing a problem when I am using decision_function in svm classifier. I have *2 classes*, but decision function is returning an array with *one column* only. Decision_function is working fine with iris dataset. I'm afraid if I am doing something wrong while reading my files/data. Any

[Scikit-learn-general] Query: Random forest

2015-02-26 Thread shalu jhanwar
Hey guys, Would you like to comment on them according to your exp.? i) if both the classes are having same probability (0.5), then which class would be predicted by Random Forest? ii) In my classification, I have seen more false predictions corresponding to the positive class by my model. Can you

[Scikit-learn-general] Re : Pull Request : Renyi entropy and Cauchy-Schwartz > mutual information

2015-02-26 Thread cécilia
I made a pull request last week with the code.Below is my reply to the last comment of the current conversation of the pull request that should respond to some of your inetrrogations.So, we can continue the conversation directly on the gitub scikit-learn site on the pull request page cdamon:MI_R

Re: [Scikit-learn-general] Self Organizing Maps

2015-02-26 Thread Gael Varoquaux
There is also an added difficulty, which is that for SOM to be interesting, they rely on specifying a topology. Most implementations use a somewhat restrictive topology that is useful only for a small number of applications,  for instance a 2D embedding topology. A general implementation is more