Re: [Scikit-learn-general] for multilabel classification is it necessary to train all combinations of the labels in the training set? Is there any way to do without training for all combinations?

2012-05-15 Thread Andreas Mueller
Hi Bilal. As far as I can see, the OneVsRestClassifier decides whether to do multi-class or multi-label by looking a the training set. This is exactly what you observe: If you only have on label per datapoint in the training set, you will only get one label back. Looking at the OneVsRestClassifier

Re: [Scikit-learn-general] multilayer perceptron questions

2012-05-15 Thread Mathieu Blondel
On Wed, May 16, 2012 at 5:31 AM, Andreas Mueller wrote: > > The SequentialDataset was made for vector x vector operations. Depending > on whether we > do mini-batch or online learning, the MLP needs vector x matrix or > matrix x matrix operations. > In particular matrix x matrix is probably not fe

Re: [Scikit-learn-general] multilayer perceptron questions

2012-05-15 Thread Andreas Mueller
On 05/15/2012 10:06 PM, David Warde-Farley wrote: > On 2012-05-15, at 3:23 PM, Andreas Mueller wrote: > >> I am not sure if we want to support sparse data. I have no experience with >> using MLPs on sparse data. >> Could this be done efficiently? The weight vector would need to be >> represented

Re: [Scikit-learn-general] multilayer perceptron questions

2012-05-15 Thread David Warde-Farley
On 2012-05-15, at 3:23 PM, Andreas Mueller wrote: > I am not sure if we want to support sparse data. I have no experience with > using MLPs on sparse data. > Could this be done efficiently? The weight vector would need to be > represented explicitly and densely, I guess. > > Any ideas? People

Re: [Scikit-learn-general] multilayer perceptron questions

2012-05-15 Thread Andreas Mueller
On 05/15/2012 05:16 PM, Mathieu Blondel wrote: On Tue, May 15, 2012 at 11:59 PM, David Warde-Farley mailto:warde...@iro.umontreal.ca>> wrote: I haven't had a look at these classes myself but I think working with raw NumPy arrays is a better idea in terms of efficiency. Since i

Re: [Scikit-learn-general] multilayer perceptron questions

2012-05-15 Thread David Warde-Farley
On Wed, May 16, 2012 at 12:16:21AM +0900, Mathieu Blondel wrote: > On Tue, May 15, 2012 at 11:59 PM, David Warde-Farley < > warde...@iro.umontreal.ca> wrote: > > > > > I haven't had a look at these classes myself but I think working with raw > > NumPy arrays is a better idea in terms of efficiency

Re: [Scikit-learn-general] multilayer perceptron questions

2012-05-15 Thread Mathieu Blondel
On Tue, May 15, 2012 at 11:59 PM, David Warde-Farley < warde...@iro.umontreal.ca> wrote: > > I haven't had a look at these classes myself but I think working with raw > NumPy arrays is a better idea in terms of efficiency. > Since it abstracts away the data representation, SequentialDataset is us

Re: [Scikit-learn-general] multilayer perceptron questions

2012-05-15 Thread David Warde-Farley
On Tue, May 15, 2012 at 12:12:34AM +0200, David Marek wrote: > Hi, > > I have worked on multilayer perceptron and I've got a basic > implementation working. You can see it at > https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp The most > important part is the sgd implementation, which can b

Re: [Scikit-learn-general] linear discriminant analysis on text data

2012-05-15 Thread Lars Buitinck
2012/5/15 Andreas Mueller : > You can find a talk by Olivier on text classification on our tutorial page: > http://scikit-learn.org/dev/presentations.html > > Maybe that is a good place to start. That method won't work directly, though; our TfidfVectorizer (the bag-of-words extractor) will produce

Re: [Scikit-learn-general] Implementation Question: safe_asarray()

2012-05-15 Thread Lars Buitinck
2012/5/14 Daniel Duckworth : > For those familiar with this function in sklearn/utils/validation.py, I was > wondering why sparse matrices are passed through silently without respecting > the `dtype` or `order` arguments.  I can understand why one would want to > ignore `order` due to how sparse ma

Re: [Scikit-learn-general] multilayer perceptron questions

2012-05-15 Thread Andreas Mueller
Hi David. I'll have a look at your code later today. Let me first answer your questions to my code On 05/15/2012 12:12 AM, David Marek wrote: > Hi, > > 2) I used Andreas' implementation as an inspiration and I am not sure > I understand some parts of it: > * Shouldn't the bias vector be initiali

Re: [Scikit-learn-general] for multilabel classification is it necessary to train all combinations of the labels in the training set? Is there any way to do without training for all combinations?

2012-05-15 Thread Andreas Mueller
Hi Bilal. For multi-label classification, the easiest approach is to train one classifier per label. There is an easy way to do this with sklearn using the OneVsRestClassifier, as described in the user guide: scikit-learn.org/dev/modules/multiclass.html

Re: [Scikit-learn-general] linear discriminant analysis on text data

2012-05-15 Thread Andreas Mueller
Hi Jaganadh. I'm no expert in text classification but I know someone who is ;) You can find a talk by Olivier on text classification on our tutorial page: http://scikit-learn.org/dev/presentations.html Maybe that is a good place to start. Cheers, Andy