Hi Vlad,
This is a problem that I have often. In my settings, the 'document' would
be a subject, and I might have multiple observations (time points) per
subject.
In practice, I have found that there are 2 efficient ways of solving it,
and that both approaches have pros and cons:
1) Concatenate
Here's a quick mockup that I used for my syllables. This e-mail contains
a write-up of my observations, followed by the reply to Andy's questions.
https://gist.github.com/4005112
I marked the groups using a indicator array. This way if you want to shuffle
the dataset, you can just apply the same
Hi Vlad.
This is definitely a good question. I have that often when representing
an image as bags of keypoints / features.
Why is it not a good solution to have X as being a list of arrays / lists?
Which algorithms do you want to use such samples in?
The text feature extraction sort of deals with
Hello,
It seems I have reached again the need for something that became
apparent when working with image patches last summer. Sometimes we
don't have a 1 to 1 correspondence between samples (rows in X) and
actual documents we are interested in scoring over. Instead, each
document consists of (a di