Re: [Scikit-learn-general] RidgeClassifier

2013-08-14 Thread Shishir Pandey
On 15-08-2013 04:47, Mathieu Blondel wrote: > On Thu, Aug 15, 2013 at 7:42 AM, Shishir Pandey > wrote: > > I might have conveyed the wrong thing here. I am using version 0.14.1 of > sklearn. I have a multiple output problem. I am using the yeast dataset. >

Re: [Scikit-learn-general] Does LinearSVC support probability/soft outputs out of the box?

2013-08-14 Thread A
Josh Wasserstein writes: > It looks like it doesn't, but I just wanted to make sure. > Josh Unless I am mistaken this might answer the question: http://comments.gmane.org/gmane.comp.python.scikit-learn/4985 -- Get

Re: [Scikit-learn-general] RidgeClassifier

2013-08-14 Thread Mathieu Blondel
On Thu, Aug 15, 2013 at 7:42 AM, Shishir Pandey wrote: > I might have conveyed the wrong thing here. I am using version 0.14.1 of > sklearn. I have a multiple output problem. I am using the yeast dataset. > The input x is a protein (103 dim vector) and the output are the > different functions it p

[Scikit-learn-general] Does LinearSVC support probability/soft outputs out of the box?

2013-08-14 Thread Josh Wasserstein
It looks like it doesn't, but I just wanted to make sure. Josh -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for b

Re: [Scikit-learn-general] RidgeClassifier

2013-08-14 Thread Shishir Pandey
> Message: 4 > Date: Thu, 15 Aug 2013 01:22:31 +0900 > From: Mathieu Blondel > Subject: Re: [Scikit-learn-general] RidgeClassifier > To: scikit-learn-general > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > On Thu, Aug 15, 2013 at 12:54 AM, Shishir Pandey wrote: > >> >

Re: [Scikit-learn-general] Selective multiclass

2013-08-14 Thread A
> That's not even a very big matrix, it's less than 100MB. Does the error occur even with n_jobs=2? Yes. -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed f

Re: [Scikit-learn-general] Selective multiclass

2013-08-14 Thread Lars Buitinck
2013/8/14 A <4rk@gmail.com>: > <14271x547546 sparse matrix of type '' > with 8223163 stored elements in Compressed Sparse Row format> That's not even a very big matrix, it's less than 100MB. Does the error occur even with n_jobs=2? -- Lars Buitinck Scientific programmer, ILPS Univers

Re: [Scikit-learn-general] Selective multiclass

2013-08-14 Thread A
> This is strange indeed - since you said you're doing text classification I suppose X is sparse? which format (csr, csc) and dtype (float64,32) are you using? <14271x547546 sparse matrix of type '' with 8223163 stored elements in Compressed Sparse Row format> --

Re: [Scikit-learn-general] RidgeClassifier

2013-08-14 Thread Mathieu Blondel
On Thu, Aug 15, 2013 at 12:54 AM, Shishir Pandey wrote: > How does the RidgeClassifier work? How does it decide how many classes > are there. In my problem there are only two classes {-1, 1} but the > Predict() gives 12, 15 and all sorts of classes. How does the > RidgeClassifier decide the thresh

[Scikit-learn-general] RidgeClassifier

2013-08-14 Thread Shishir Pandey
Hi How does the RidgeClassifier work? How does it decide how many classes are there. In my problem there are only two classes {-1, 1} but the Predict() gives 12, 15 and all sorts of classes. How does the RidgeClassifier decide the thresholds for each class? Thanks. -- sp ---

Re: [Scikit-learn-general] RidgeClassifier

2013-08-14 Thread Andreas Mueller
This sounds like a bug. Which version are you on? And what is (the shape of) your input data and more importantly your input classes? And what is their dtype? On 08/14/2013 05:54 PM, Shishir Pandey wrote: > Hi > > How does the RidgeClassifier work? How does it decide how many classes > are there

[Scikit-learn-general] Unable to test a dummy classifier with a score function that requires a probability estimate

2013-08-14 Thread Josh Wasserstein
Say I define the following scoring function: def multi_label_macro_auc(y_gt, y_pred): n_labels = y_pred.shape[1] auc_scores = [None] * n_labels for label in xrange(n_labels): auc_scores[label] = roc_auc_score((y_gt == label)*1, y_pred[:,label]) return np.mean(auc_scores) ml

Re: [Scikit-learn-general] bag of features

2013-08-14 Thread Andreas Mueller
On 08/14/2013 02:00 PM, abhishek wrote: > Hi, > > suppose we have a list of numpy arrays. These numpy arrays are two > dimensional and have equal number of columns but unequal number of > rows. I dont think that scikit classifiers will work on these kind of > features or maybe i'm missing someth

Re: [Scikit-learn-general] bag of features

2013-08-14 Thread Joel Nothman
I think sklearn.feature_extraction.DictVectorizer is designed to handle this sort of case, but produces a single array representing multiple categorical variables, rather than a set of separate arrays. - Joel On Wed, Aug 14, 2013 at 10:00 PM, abhishek wrote: > Hi, > > suppose we have a list of

Re: [Scikit-learn-general] bag of features

2013-08-14 Thread abhishek
Hi, suppose we have a list of numpy arrays. These numpy arrays are two dimensional and have equal number of columns but unequal number of rows. I dont think that scikit classifiers will work on these kind of features or maybe i'm missing something? On Wed, Aug 14, 2013 at 12:18 PM, Andreas Muell

Re: [Scikit-learn-general] bag of features

2013-08-14 Thread Andreas Mueller
On 08/14/2013 11:56 AM, abhishek wrote: > hi, > > Is there a classifier in scikit-leran that suports bag of feature vectors? All classifiers work on numpy arrays, and many work on scipy sparse matrices. What is special about bag of feature representations that makes you ask this question? Cheers

[Scikit-learn-general] bag of features

2013-08-14 Thread abhishek
hi, Is there a classifier in scikit-leran that suports bag of feature vectors? -- Regards Abhishek Thakur -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for p

Re: [Scikit-learn-general] Selective multiclass

2013-08-14 Thread Lars Buitinck
2013/8/14 Peter Prettenhofer > The coef matrix is allocated before the sub processes are forked so you will need (n_jobs + 1) * 12 gb just for the coefs. Worse, it turns out these huge matrices may get pickled and sent to the child process over a pipe. > The systemerror is quite strange though..