The Map/Reduce SGD patch includes a very nice trick which I did not
know about. Here is an example:
https://reviews.apache.org/r/3072/diff/2/?file=63195#file63195line36
It uses DummyRecordWriter to ship key/value pairs from mapper to reducer.
On Mon, Dec 12, 2011 at 12:43 PM, Isabel Drost wrote
On 08.12.2011 Grant Ingersoll wrote:
> great, b/c these really are mainstream tests. I suspect most of our
> overhead is simply due to running map reduce jobs.
Is there anything to be gained by checking the code itself with mrunit (I know
it does have limitations, but if those tests really only
On Mon, Dec 12, 2011 at 1:15 PM, Grant Ingersoll wrote:
> I'm not sure if it is completely valid, but it seems to me that if our
> tests can't run concurrently, it also raises a doubt as to whether some of
> our classes can be run concurrently.
>
I don't think there's a cause for concern; it's re
On Dec 12, 2011, at 7:11 AM, Sean Owen wrote:
> On Mon, Dec 12, 2011 at 11:59 AM, Grant Ingersoll wrote:
>
>> In Lucene, we simply print out what the seed is if the tests fail and then
>> you can rerun that test by saying ant -Dtestseed= test
>>
>
> I like that -- it's a separate thing but
On Mon, Dec 12, 2011 at 11:59 AM, Grant Ingersoll wrote:
> In Lucene, we simply print out what the seed is if the tests fail and then
> you can rerun that test by saying ant -Dtestseed= test
>
I like that -- it's a separate thing but it's a fine idea too. It lets you
at least try different se
On Dec 12, 2011, at 2:05 AM, Sean Owen wrote:
>
> It *seems* so much more like a test issue to me, solvable in the test
> harness, and in a clear way: just split tests n ways across n JVMs instead
> of 1 JVM with n threads. No (further) reliance on code being exactly well
> behaved. It's just we
On Dec 11, 2011, at 2:42 PM, Sean Owen wrote:
> On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote:
>
>> The right way to handle this is to have instances get a random number
>> generator that works like it should. Magic resets in the middle of
>> operation are not a good idea.
>>
>
> Why wo
On Mon, Dec 12, 2011 at 2:49 AM, Ted Dunning wrote:
> > > Why would the caller care? It's all random numbers, whether "reset" or
> > not.
> >
>
> The care is about determinism.
>
Completely agree, it's the tests that care, not the caller itself.
> If the RandUtils notes that the current threa
On Sun, Dec 11, 2011 at 6:48 PM, Lance Norskog wrote:
> What about using ThreadLocal generators?
>
> On Sun, Dec 11, 2011 at 11:42 AM, Sean Owen wrote:
> > On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning
> wrote:
> >
> >> The right way to handle this is to have instances get a random number
> >> g
What about using ThreadLocal generators?
On Sun, Dec 11, 2011 at 11:42 AM, Sean Owen wrote:
> On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote:
>
>> The right way to handle this is to have instances get a random number
>> generator that works like it should. Magic resets in the middle of
>> o
On Sun, Dec 11, 2011 at 7:35 PM, Ted Dunning wrote:
> The right way to handle this is to have instances get a random number
> generator that works like it should. Magic resets in the middle of
> operation are not a good idea.
>
Why would the caller care? It's all random numbers, whether "reset"
The right way to handle this is to have instances get a random number
generator that works like it should. Magic resets in the middle of
operation are not a good idea.
I think we need a better way to inject generators that doesn't involve
statics.
On Sun, Dec 11, 2011 at 6:24 AM, Sean Owen wrot
Yes that's exactly what's happening -- not why the tests aren't running
fast, but why running them in parallel in one JVM results in
non-deterministic results.
If by "not use statics" you mean hold a static reference to a Random in
client code, yes, that could help, except that you'd also have to
As a point of reference, if I comment out the reset() code in useTestSeed for
the math package, all tests pass w/ parallel execution and fork once. Of
course, that's just one piece.
I guess I don't understand why we need to do all that reset stuff there anyway.
If you are using the test see
In working through what I _think_ will be the primary viable way to make this
stuff faster (parallel execution, fork once) it appears to me that the primary
concurrency issue is due to how we initialize the test seed and the fact that
we loop over all RandomWrapper objects and reset them. So, i
On Thu, Dec 8, 2011 at 3:52 PM, Sean Owen wrote:
>
> On Thu, Dec 8, 2011 at 11:34 PM, Lance Norskog wrote:
>
> > A lot of tests are dependent on fixed random numbers.
> >
>
> (Which is not on purpose, I assume.)
>
They are coded that way- a great many unit tests of stochastic things do this:
which brings me to the thought, we could probably employ some
heuristics to deduce that parameter by the time ABt job runs. or just
set it sufficiently high there just for the case of ABt. Some danger
is if we set it significantly >> s, then we may overallocate some
memory we will never use. but it
I don't think it would make sense for SSVD to remove MR. I mean, sure,
we can test something like Givens solver independently, but it would
not be testing much really.
I will reduce the dimensionality there.
Also, there are a lot of tests ( sparse/dense, sparse with power
iteration and without ,
On Dec 8, 2011, at 12:55 PM, Sean Owen wrote:
> This could well be it. While every Random everywhere gets initialized to a
> known initial state, at the start of every @Test method, you could get
> different sequences if other tests are in progress in parallel in the same
> JVM.
>
> Ideally test
On Thu, Dec 8, 2011 at 11:34 PM, Lance Norskog wrote:
> A lot of tests are dependent on fixed random numbers.
>
(Which is not on purpose, I assume.)
>
> Starting a new JVM per test is a non-starter on smaller machines. Can
> single-process mode be a default with a parameter for running paralle
A lot of tests are dependent on fixed random numbers.
Starting a new JVM per test is a non-starter on smaller machines. Can
single-process mode be a default with a parameter for running parallel?
Or could the unit test framework have a Random factory object for each test?
On Thu, Dec 8, 2011 at
Progress! I had configured the surefire plugin in the wrong place
On Dec 8, 2011, at 2:55 PM, Sean Owen wrote:
> This could well be it. While every Random everywhere gets initialized to a
> known initial state, at the start of every @Test method, you could get
> different sequences if other tes
I think that just means the test is dependent on the particular sequence of
random numbers. Ideally it should loosen its definition of correct a bit.
On Thu, Dec 8, 2011 at 7:59 PM, Grant Ingersoll wrote:
>
> On Dec 8, 2011, at 2:55 PM, Sean Owen wrote:
>
> > This could well be it. While every R
On Dec 8, 2011, at 2:55 PM, Sean Owen wrote:
> This could well be it. While every Random everywhere gets initialized to a
> known initial state, at the start of every @Test method, you could get
> different sequences if other tests are in progress in parallel in the same
> JVM.
I'm also trying u
This could well be it. While every Random everywhere gets initialized to a
known initial state, at the start of every @Test method, you could get
different sequences if other tests are in progress in parallel in the same
JVM.
Ideally tests aren't that sensitive to the sequence of random numbers --
On Dec 8, 2011, at 2:39 PM, Grant Ingersoll wrote:
>
> On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote:
>
>> If I add parallel, fork always to the main surefire config, I get failures
>> all over the place for things like:
>> Failed tests:
>> testHebbianSolver(org.apache.mahout.math.decom
On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote:
> If I add parallel, fork always to the main surefire config, I get failures
> all over the place for things like:
> Failed tests:
> testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSolver):
> Error: {0.0614604997488015
If I add parallel, fork always to the main surefire config, I get failures all
over the place for things like:
Failed tests:
testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSolver):
Error: {0.06146049974880152 too high! (for eigen 3)
consistency(org.apache.mahout.math.
I working on parallel at the moment, which should help if we can get it to
work. We could certainly setup nightly, but I don't really think that is
great, b/c these really are mainstream tests. I suspect most of our overhead
is simply due to running map reduce jobs.
On Dec 8, 2011, at 8:42 AM
SSVD actually runs a rather small test but it is a MR job in local
mode, there's nothing to cut down there in terms of size (not much
anyway). It's just what it takes to initialize and run all jobs (and
since it is local, it is also single threaded, so it actually runs V
and U jobs sequentially ins
On Dec 8, 2011, at 8:36 AM, Grant Ingersoll wrote:
> MAHOUT-916 and 917 are attempts to address the running time of our tests. As
> Sean rightfully pointed out, there are probably opportunities to simply cut
> down the sizes of some of these tests w/o effecting there correctness. To
> th
4. and 5. already run on toy data.
I have some rather excessive 'integration'-like tests that execute
Hadoop in a local JVM. The tests take very long but are also very
helpful in finding subtle bugs.
Maybe there is a way to execute these tests only once a day or so?
--sebastian
On 08.12.2011 14
MAHOUT-916 and 917 are attempts to address the running time of our tests. As
Sean rightfully pointed out, there are probably opportunities to simply cut
down the sizes of some of these tests w/o effecting there correctness. To that
end, if people can take a look at:
https://builds.apache.org/j
33 matches
Mail list logo