Re: Issues with ALS positive definite

2014-10-16 Thread Xiangrui Meng
Do not use lambda=0.0. Use a small number instead. Cholesky factorization doesn't work on semi-positive systems with 0 eigenvalues. -Xiangrui On Wed, Oct 15, 2014 at 5:05 PM, Debasish Das debasish.da...@gmail.com wrote: But do you expect the mllib code to fail if I run with 0.0 regularization ?

Re: Issues with ALS positive definite

2014-10-16 Thread Sean Owen
It Gramian is at least positive semidefinite and will be definite if the matrix is non singular, yes. That's usually but not always true. The lambda*I matrix is positive definite, well, when lambda is positive. Adding that makes it definite. At least, lambda=0 could be rejected as invalid. But

Re: Issues with ALS positive definite

2014-10-16 Thread Debasish Das
@xiangrui should we add this epsilon inside ALS code itself ? So that if user by mistake put 0.0 as regularization, LAPACK failures does not show up... @sean For the proximal algorithms I am using Cholesky for L1 and LU for equality and bound constraints (since the matrix is quasi definite)...I

Re: Issues with ALS positive definite

2014-10-16 Thread Debasish Das
Just checked, QR is exposed by netlib: import org.netlib.lapack.Dgeqrf For the equality and bound version, I will use QR...it will be faster than the LU that I am using through jblas.solveSymmetric... On Thu, Oct 16, 2014 at 8:34 AM, Debasish Das debasish.da...@gmail.com wrote: @xiangrui

accumulators

2014-10-16 Thread Sean McNamara
Accumulators on the stage info page show the rolling life time value of accumulators as well as per task which is handy. I think it would be useful to add another field to the “Accumulators” table that also shows the total for the stage you are looking at (basically just a merge of the

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread shane knapp
the bad news is that we've had a couple more failures due to timeouts, but the good news is that the frequency that these happen has decreased significantly (3 in the past ~18hr). seems like the git plugin downgrade has helped relieve the problem, but hasn't fixed it. i'll be looking in to this

Re: Unit testing Master-Worker Message Passing

2014-10-16 Thread Josh Rosen
Hi Matt, I’m not sure whether those tests will actually find this specific issue.  The tests that I linked to test Spark’s Zookeeper-based multi-master mode, whereas it sounds like you’re seeing this issue in regular standalone cluster.  In those tests, the workers disconnect from the master

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
Thanks for continuing to look into this, Shane. One suggestion that Patrick brought up, if we have trouble getting to the bottom of this, is doing the git checkout ourselves in the run-tests-jenkins script and cutting out the Jenkins git plugin entirely. That way we can script retries and post

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread shane knapp
yeah, at this point it might be worth trying. :) the absolutely irritating thing is that i am not seeing this happen w/any other jobs other that the spark prb, nor does it seem to correlate w/time of day, network or system load, or what slave it runs on. nor are we hitting our limit of

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
On Thu, Oct 16, 2014 at 3:55 PM, shane knapp skn...@berkeley.edu wrote: i really, truly hate non-deterministic failures. Amen bruddah.