Re: Tests running time

2011-12-12 Thread Lance Norskog
The Map/Reduce SGD patch includes a very nice trick which I did not know about. Here is an example: https://reviews.apache.org/r/3072/diff/2/?file=63195#file63195line36 It uses DummyRecordWriter to ship key/value pairs from mapper to reducer. On Mon, Dec 12, 2011 at 12:43 PM, Isabel Drost wrote

Re: Tests running time

2011-12-12 Thread Isabel Drost
On 08.12.2011 Grant Ingersoll wrote: > great, b/c these really are mainstream tests. I suspect most of our > overhead is simply due to running map reduce jobs. Is there anything to be gained by checking the code itself with mrunit (I know it does have limitations, but if those tests really only

Re: Tests running time

2011-12-12 Thread Sean Owen
On Mon, Dec 12, 2011 at 1:15 PM, Grant Ingersoll wrote: > I'm not sure if it is completely valid, but it seems to me that if our > tests can't run concurrently, it also raises a doubt as to whether some of > our classes can be run concurrently. > I don't think there's a cause for concern; it's re

Re: Tests running time

2011-12-12 Thread Grant Ingersoll
On Dec 12, 2011, at 7:11 AM, Sean Owen wrote: > On Mon, Dec 12, 2011 at 11:59 AM, Grant Ingersoll wrote: > >> In Lucene, we simply print out what the seed is if the tests fail and then >> you can rerun that test by saying ant -Dtestseed= test >> > > I like that -- it's a separate thing but

Re: Tests running time

2011-12-12 Thread Sean Owen
On Mon, Dec 12, 2011 at 11:59 AM, Grant Ingersoll wrote: > In Lucene, we simply print out what the seed is if the tests fail and then > you can rerun that test by saying ant -Dtestseed= test > I like that -- it's a separate thing but it's a fine idea too. It lets you at least try different se

Re: Tests running time

2011-12-12 Thread Grant Ingersoll
On Dec 12, 2011, at 2:05 AM, Sean Owen wrote: > > It *seems* so much more like a test issue to me, solvable in the test > harness, and in a clear way: just split tests n ways across n JVMs instead > of 1 JVM with n threads. No (further) reliance on code being exactly well > behaved. It's just we

Re: Tests running time

2011-12-12 Thread Grant Ingersoll
On Dec 11, 2011, at 2:42 PM, Sean Owen wrote: > On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote: > >> The right way to handle this is to have instances get a random number >> generator that works like it should. Magic resets in the middle of >> operation are not a good idea. >> > > Why wo

Re: Tests running time

2011-12-11 Thread Sean Owen
On Mon, Dec 12, 2011 at 2:49 AM, Ted Dunning wrote: > > > Why would the caller care? It's all random numbers, whether "reset" or > > not. > > > > The care is about determinism. > Completely agree, it's the tests that care, not the caller itself. > If the RandUtils notes that the current threa

Re: Tests running time

2011-12-11 Thread Ted Dunning
On Sun, Dec 11, 2011 at 6:48 PM, Lance Norskog wrote: > What about using ThreadLocal generators? > > On Sun, Dec 11, 2011 at 11:42 AM, Sean Owen wrote: > > On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning > wrote: > > > >> The right way to handle this is to have instances get a random number > >> g

Re: Tests running time

2011-12-11 Thread Lance Norskog
What about using ThreadLocal generators? On Sun, Dec 11, 2011 at 11:42 AM, Sean Owen wrote: > On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote: > >> The right way to handle this is to have instances get a random number >> generator that works like it should.  Magic resets in the middle of >> o

Re: Tests running time

2011-12-11 Thread Sean Owen
On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote: > The right way to handle this is to have instances get a random number > generator that works like it should. Magic resets in the middle of > operation are not a good idea. > Why would the caller care? It's all random numbers, whether "reset"

Re: Tests running time

2011-12-11 Thread Ted Dunning
The right way to handle this is to have instances get a random number generator that works like it should. Magic resets in the middle of operation are not a good idea. I think we need a better way to inject generators that doesn't involve statics. On Sun, Dec 11, 2011 at 6:24 AM, Sean Owen wrot

Re: Tests running time

2011-12-11 Thread Sean Owen
Yes that's exactly what's happening -- not why the tests aren't running fast, but why running them in parallel in one JVM results in non-deterministic results. If by "not use statics" you mean hold a static reference to a Random in client code, yes, that could help, except that you'd also have to

Re: Tests running time

2011-12-11 Thread Grant Ingersoll
As a point of reference, if I comment out the reset() code in useTestSeed for the math package, all tests pass w/ parallel execution and fork once. Of course, that's just one piece. I guess I don't understand why we need to do all that reset stuff there anyway. If you are using the test see

Re: Tests running time

2011-12-11 Thread Grant Ingersoll
In working through what I _think_ will be the primary viable way to make this stuff faster (parallel execution, fork once) it appears to me that the primary concurrency issue is due to how we initialize the test seed and the fact that we loop over all RandomWrapper objects and reset them. So, i

Re: Tests running time

2011-12-10 Thread Lance Norskog
On Thu, Dec 8, 2011 at 3:52 PM, Sean Owen wrote: > > On Thu, Dec 8, 2011 at 11:34 PM, Lance Norskog wrote: > > > A lot of tests are dependent on fixed random numbers. > > > > (Which is not on purpose, I assume.) > They are coded that way- a great many unit tests of stochastic things do this:

Re: Tests running time

2011-12-09 Thread Dmitriy Lyubimov
which brings me to the thought, we could probably employ some heuristics to deduce that parameter by the time ABt job runs. or just set it sufficiently high there just for the case of ABt. Some danger is if we set it significantly >> s, then we may overallocate some memory we will never use. but it

Re: Tests running time

2011-12-09 Thread Dmitriy Lyubimov
I don't think it would make sense for SSVD to remove MR. I mean, sure, we can test something like Givens solver independently, but it would not be testing much really. I will reduce the dimensionality there. Also, there are a lot of tests ( sparse/dense, sparse with power iteration and without ,

Re: Tests running time

2011-12-09 Thread Grant Ingersoll
On Dec 8, 2011, at 12:55 PM, Sean Owen wrote: > This could well be it. While every Random everywhere gets initialized to a > known initial state, at the start of every @Test method, you could get > different sequences if other tests are in progress in parallel in the same > JVM. > > Ideally test

Re: Tests running time

2011-12-08 Thread Sean Owen
On Thu, Dec 8, 2011 at 11:34 PM, Lance Norskog wrote: > A lot of tests are dependent on fixed random numbers. > (Which is not on purpose, I assume.) > > Starting a new JVM per test is a non-starter on smaller machines. Can > single-process mode be a default with a parameter for running paralle

Re: Tests running time

2011-12-08 Thread Lance Norskog
A lot of tests are dependent on fixed random numbers. Starting a new JVM per test is a non-starter on smaller machines. Can single-process mode be a default with a parameter for running parallel? Or could the unit test framework have a Random factory object for each test? On Thu, Dec 8, 2011 at

Re: Tests running time

2011-12-08 Thread Grant Ingersoll
Progress! I had configured the surefire plugin in the wrong place On Dec 8, 2011, at 2:55 PM, Sean Owen wrote: > This could well be it. While every Random everywhere gets initialized to a > known initial state, at the start of every @Test method, you could get > different sequences if other tes

Re: Tests running time

2011-12-08 Thread Sean Owen
I think that just means the test is dependent on the particular sequence of random numbers. Ideally it should loosen its definition of correct a bit. On Thu, Dec 8, 2011 at 7:59 PM, Grant Ingersoll wrote: > > On Dec 8, 2011, at 2:55 PM, Sean Owen wrote: > > > This could well be it. While every R

Re: Tests running time

2011-12-08 Thread Grant Ingersoll
On Dec 8, 2011, at 2:55 PM, Sean Owen wrote: > This could well be it. While every Random everywhere gets initialized to a > known initial state, at the start of every @Test method, you could get > different sequences if other tests are in progress in parallel in the same > JVM. I'm also trying u

Re: Tests running time

2011-12-08 Thread Sean Owen
This could well be it. While every Random everywhere gets initialized to a known initial state, at the start of every @Test method, you could get different sequences if other tests are in progress in parallel in the same JVM. Ideally tests aren't that sensitive to the sequence of random numbers --

Re: Tests running time

2011-12-08 Thread Grant Ingersoll
On Dec 8, 2011, at 2:39 PM, Grant Ingersoll wrote: > > On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote: > >> If I add parallel, fork always to the main surefire config, I get failures >> all over the place for things like: >> Failed tests: >> testHebbianSolver(org.apache.mahout.math.decom

Re: Tests running time

2011-12-08 Thread Grant Ingersoll
On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote: > If I add parallel, fork always to the main surefire config, I get failures > all over the place for things like: > Failed tests: > testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSolver): > Error: {0.0614604997488015

Re: Tests running time

2011-12-08 Thread Grant Ingersoll
If I add parallel, fork always to the main surefire config, I get failures all over the place for things like: Failed tests: testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSolver): Error: {0.06146049974880152 too high! (for eigen 3) consistency(org.apache.mahout.math.

Re: Tests running time

2011-12-08 Thread Grant Ingersoll
I working on parallel at the moment, which should help if we can get it to work. We could certainly setup nightly, but I don't really think that is great, b/c these really are mainstream tests. I suspect most of our overhead is simply due to running map reduce jobs. On Dec 8, 2011, at 8:42 AM

Re: Tests running time

2011-12-08 Thread Dmitriy Lyubimov
SSVD actually runs a rather small test but it is a MR job in local mode, there's nothing to cut down there in terms of size (not much anyway). It's just what it takes to initialize and run all jobs (and since it is local, it is also single threaded, so it actually runs V and U jobs sequentially ins

Re: Tests running time

2011-12-08 Thread David Murgatroyd
On Dec 8, 2011, at 8:36 AM, Grant Ingersoll wrote: > MAHOUT-916 and 917 are attempts to address the running time of our tests. As > Sean rightfully pointed out, there are probably opportunities to simply cut > down the sizes of some of these tests w/o effecting there correctness. To > th

Re: Tests running time

2011-12-08 Thread Sebastian Schelter
4. and 5. already run on toy data. I have some rather excessive 'integration'-like tests that execute Hadoop in a local JVM. The tests take very long but are also very helpful in finding subtle bugs. Maybe there is a way to execute these tests only once a day or so? --sebastian On 08.12.2011 14

Tests running time

2011-12-08 Thread Grant Ingersoll
MAHOUT-916 and 917 are attempts to address the running time of our tests. As Sean rightfully pointed out, there are probably opportunities to simply cut down the sizes of some of these tests w/o effecting there correctness. To that end, if people can take a look at: https://builds.apache.org/j