Re: [Scikit-learn-general] API for multi-sample "documents"

2012-11-10 Thread Gael Varoquaux
Hi Vlad, This is a problem that I have often. In my settings, the 'document' would be a subject, and I might have multiple observations (time points) per subject. In practice, I have found that there are 2 efficient ways of solving it, and that both approaches have pros and cons: 1) Concatenate

Re: [Scikit-learn-general] API for multi-sample "documents"

2012-11-02 Thread Vlad Niculae
Here's a quick mockup that I used for my syllables. This e-mail contains a write-up of my observations, followed by the reply to Andy's questions. https://gist.github.com/4005112 I marked the groups using a indicator array. This way if you want to shuffle the dataset, you can just apply the same

Re: [Scikit-learn-general] API for multi-sample "documents"

2012-10-31 Thread Andreas Mueller
Hi Vlad. This is definitely a good question. I have that often when representing an image as bags of keypoints / features. Why is it not a good solution to have X as being a list of arrays / lists? Which algorithms do you want to use such samples in? The text feature extraction sort of deals with

[Scikit-learn-general] API for multi-sample "documents"

2012-10-31 Thread Vlad Niculae
Hello, It seems I have reached again the need for something that became apparent when working with image patches last summer. Sometimes we don't have a 1 to 1 correspondence between samples (rows in X) and actual documents we are interested in scoring over. Instead, each document consists of (a di