Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-31 Thread Will Buckner
Hey Lars, So, my model built successfully last night with your new code, thanks! I'll see if I can build larger models as well this week and put memory usage to the test a bit, but this is a *massive* improvement. Our Scientist is making sure the output is sane, but I'm assuming you already did ba

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-30 Thread Will Buckner
Thanks so much for spending some time on this. I'll give it a try first thing tomorrow and report back. Thanks Lars! -Will On Fri, Aug 30, 2013 at 1:32 AM, Lars Buitinck wrote: > 2013/8/30 Will Buckner : > > Damn, hmm. This just seems so so heavy to calculate reconstruction_err, > > which isn'

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-30 Thread Lars Buitinck
2013/8/30 Will Buckner : > Damn, hmm. This just seems so so heavy to calculate reconstruction_err, > which isn't even used inside the algorithm. I don't even use it in the > pipeline. My current best idea is just to subclass ProjectedGradientNMF() > and overload fit_transform(), not computing recon

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-29 Thread Will Buckner
Damn, hmm. This just seems so so heavy to calculate reconstruction_err, which isn't even used inside the algorithm. I don't even use it in the pipeline. My current best idea is just to subclass ProjectedGradientNMF() and overload fit_transform(), not computing reconstruction_err at all On Thu

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-29 Thread Will Buckner
This looks great, and I'll talk to my lead scientist about incorporating it and evaluating, thanks! I must warn you all, I'm not an algorithms guy; I'm on the software/performance/"make this shit work" side of things. For the task at hand, we're using NMF for a reason, and I've gotta make this work

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-29 Thread Lars Buitinck
2013/8/29 Lars Buitinck : > W, H = csr_matrix(W), csr_matrix(H) > reconstruction_err = euclidean_distances(X, W * H).sum() Never mind. Even after fixing the formula, W * H is actually too dense for this to be any good, regardless of the initialization and sparseness parameter. -- Lars Buitinck S

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-29 Thread Lars Buitinck
2013/8/29 Will Buckner : > Er, it looks like safe_sparse_dot() returns sparse unless dense_output=True. No, it returns dense output when one of its two arguments is dense. dense_output only exists to force dense output in the sparse-sparse case. safe_sparse_norm(X - safe_sparse_dot(W, H)) would n

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-29 Thread Alexandre Gramfort
> Er, it looks like safe_sparse_dot() returns sparse unless dense_output=True. > And, I'm confused as to how this would result in more memory. Aren't we > allocating more in the lines above for the issparse(X) case? I'm stick right > now because my 40k x 220k CSR matrix can't make it past computing

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-29 Thread Olivier Grisel
2013/8/29 Will Buckner : >> the motivation for these lines is that even if X is sparse >> safe_sparse_dot(W, H) > will not be. So you will allocate a matrix of size X but dense which is > unacceptable in many cases. > > Er, it looks like safe_sparse_dot() returns sparse unless dense_output=True. >

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-29 Thread Will Buckner
> the motivation for these lines is that even if X is sparse safe_sparse_dot(W, H) will not be. So you will allocate a matrix of size X but dense which is unacceptable in many cases. Er, it looks like safe_sparse_dot() returns sparse unless dense_output=True. And, I'm confused as to how this would

Re: [Scikit-learn-general] A Few Questions About decomposition.nmf

2013-08-28 Thread Alexandre Gramfort
hi Will, > if not sp.issparse(X): > > self.reconstruction_err_ = norm(X - np.dot(W, H)) > > else: > > norm2X = np.sum(X.data ** 2) # Ok because X is CSR > > normWHT = np.trace(np.dot(np.dot(H.T, np.dot(W.T, W)), H)) > >