Hi Youssef,
Regarding memory usage, you should know that it'll basically blow up if you
increase the number of jobs. With the current implementation, you'll need
O(n_jobs * |X| * 2) in memory space (where |X| is the size of X, in bytes).
That issue stems from the use of joblib which basically forc
Hi Youssef,
You're trying to do exactly what I did. First thing to note is that the
Microsoft guys don't precompute the features, rather they compute them on
the fly. That means that they only need enough memory to store the depth
images, and since they have a 1000 core cluster, computing the feat
Exactly, I was talking about predict and about the state of the
estimator. It seemed much more difficult before I thought about it
better :)
On Thu, Apr 25, 2013 at 10:54 AM, Mathieu Blondel wrote:
>
> On Thu, Apr 25, 2013 at 10:26 AM, Vlad Niculae wrote:
>>
>> If we are talking about the same t
On Thu, Apr 25, 2013 at 10:26 AM, Vlad Niculae wrote:
> If we are talking about the same thing, you are returning clusters of
> samples and features together (ie rows and columns). So if in K-means
> we return a 1D array with cluster labels, here the output would be two
> arrays, one of (n_sample
If we are talking about the same thing, you are returning clusters of
samples and features together (ie rows and columns). So if in K-means
we return a 1D array with cluster labels, here the output would be two
arrays, one of (n_samples,) and one of (n_features,). Another
alternative would be a li
Hello,
I am trying to reproduce the results of this paper:
http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf with
different kinds of data (monkey depth maps instead of humans). So I am
generating my depth features and training and classifying data with a
random forest with quite s
Could you elaborate why it would require a new API?
Mathieu
On Apr 25, 2013 9:08 AM, "Vlad Niculae" wrote:
> The Baader-Meinhof phenomenon in action -- only 2 days ago I saw a
> talk about information-theoretic biclustering (aka co-clustering)
> applied to opinion mining of video game reviews a
The Baader-Meinhof phenomenon in action -- only 2 days ago I saw a
talk about information-theoretic biclustering (aka co-clustering)
applied to opinion mining of video game reviews and the method raised
my attention. An efficient implementation would be very nice, but it
will definitely require a
Hi Kemal,
On Thu, Apr 25, 2013 at 6:56 AM, Kemal Eren wrote:
>
> If you are looking for biclustering algorithms I could certainly do that.
> I did my Master's thesis on it and wrote this software:
> http://bmi.osu.edu/hpc/software/bibench/. Its biclustering algorithms are
> wrappers to existing
Hi Mathieu and team,
If you are looking for biclustering algorithms I could certainly do that. I
did my Master's thesis on it and wrote this software:
http://bmi.osu.edu/hpc/software/bibench/. Its biclustering algorithms are
wrappers to existing tools. It would be really nice to have Python/Cython
On Sun, Apr 21, 2013 at 09:36:57PM -0400, Skipper Seabold wrote:
> Does anyone have any code for computing rotations of components after
> PCA or FactorAnalysis, etc. E.g., varimax?
No (apart from ICA that is in scikit-learn), but I would be interested in
a varimax code to play with :).
G
--
Something I would like to see in the scikit, if someone is looking for an
idea, is biclustering:
http://en.wikipedia.org/wiki/Biclustering
Mathieu
--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the onl
hi,
I'd use LeaveOneLabelOut where the label contains the sites indices.
Basically the question is "do you generalize well to data acquired
some place else"
Alex
On Wed, Apr 24, 2013 at 5:08 PM, Lars Buitinck wrote:
> 2013/4/23 John Richey :
>> clf.fit(X_train, X_test)
>
> You should fit on X_t
Hi Alex,
If I understand correctly you are using 2 different kinds of features :
categorical + ngrams.
In a similar situation but in a classification setting a trick that worked
reasonably well was to train two different models, one feeding the other.
I.e. build a first model out of ngrams/nlp f
2013/4/23 John Richey :
> clf.fit(X_train, X_test)
You should fit on X_train and y_train, not X_test.
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
--
Try New Relic Now & We'll Send You this Cool
Hello,
I am having difficulty with a cross validation problem, and any help would be
much appreciated.
I have a large number of research subjects from 15 different data collection
sites. I want to assess whether "site" has any influence on the data.
It occurred to me that one way to do this
2013/4/24 Alex Kopp :
> Thanks, guys.
>
> Perhaps I should explain what I am trying to do and then open it up for
> suggestions.
>
> I have 203k training examples each with 457k features. The features are
> composed of one-hot encoded categorical values as well as stemmed, TFIDF
> weighted unigrams
Have you tried tuning the hyper-parameters of the SGDRegressor? You really
need to tune the learning rate for SGDRegressor (SGDClassifier has a pretty
decent default). E.g. set up a grid search w/ a constant learning rate and
try different values of eta0 ([0.1, 0.01, 0.001, 0.0001]). You can also s
Thanks, guys.
Perhaps I should explain what I am trying to do and then open it up for
suggestions.
I have 203k training examples each with 457k features. The features are
composed of one-hot encoded categorical values as well as stemmed, TFIDF
weighted unigrams and bigrams (NLP). As you can proba
Thank you,
Do you have some references prepared? It would be useful.
I am not sure if what is in my head is correct but I think association
rule learning is interesting and a kind of method that I would like to
see in scikit-learn, as well as finding frequent itemsets. I hope I'm
thinking of the
2013/4/24 Olivier Grisel
> 2013/4/24 Peter Prettenhofer :
> > I totally agree with Brian - although I'd suggest you drop option 3)
> because
> > it will be a lot of work.
> >
> > I'd suggest you rather should do a) feature extraction or b) feature
> > selection.
> >
> > Personally, I think decisi
2013/4/24 Peter Prettenhofer :
> I totally agree with Brian - although I'd suggest you drop option 3) because
> it will be a lot of work.
>
> I'd suggest you rather should do a) feature extraction or b) feature
> selection.
>
> Personally, I think decision trees in general and random forest in
> pa
I totally agree with Brian - although I'd suggest you drop option 3)
because it will be a lot of work.
I'd suggest you rather should do a) feature extraction or b) feature
selection.
Personally, I think decision trees in general and random forest in
particular are not a good fit for sparse datase
On Wed, Apr 24, 2013 at 12:46 PM, Vlad Niculae wrote:
> However, I think it would be nice to have some proposals that focus on
> internals: consistency, clean up, refactoring of modules that need it
> or documentation improvements. As long as the task is measurable,
> closed-ended and well-defin
Hi Vlad,
It looks good for me to focus on the proposal now and looking into mentor
later.
I am considering collaborative filtering with *user similarity* and *item
similarity*.
And also* association rule learning* for finding out general behaviour of a
user-item group.
I think those two would be
25 matches
Mail list logo