Re: Intermittant Test Failure: testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)

2010-04-29 Thread Grant Ingersoll
On Apr 29, 2010, at 6:36 PM, Jeff Eastman wrote: > right at the end of the 15 min core tests which makes it especially annoying. Lucene just put in parallel JUnit tests and they've gotten a lot faster.

Re: Intermittant Test Failure: testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)

2010-04-29 Thread Sean Owen
I had taken on MAHOUT-302 which is basically about overhauling how temp data is handled for tests. I think we can indeed handle it more cleanly and in a way such that collisions never happen. I'm still in the middle of it. On Thu, Apr 29, 2010 at 11:38 PM, Ted Dunning wrote: > Any chance to use a

Re: Negative LLR Score

2010-04-29 Thread Sean Owen
I could sure be wrong about this (or perhaps out of date). It makes sense in theory. But I can't find it in the JLS and in the bytecode I still see it calling Math.log(), calling StrictMath.log(), FWIW. I would actually believe a JIT would do something with this. But I still find myself always prog

Re: Intermittant Test Failure: testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)

2010-04-29 Thread Ted Dunning
Any chance to use a new name? On Thu, Apr 29, 2010 at 3:36 PM, Jeff Eastman wrote: > The surfire report seems to indicate this might be a timing problem with > hdfs being lazy. Sometimes it passes and sometimes it fails, but of course, > right at the end of the 15 min core tests which makes it es

Re: Negative LLR Score

2010-04-29 Thread Ted Dunning
On Thu, Apr 29, 2010 at 3:29 PM, Sean Owen wrote: > You mean "sum * Math.log(sum)"? > Of course. Sorry. > javac definitely isn't allowed to do that kind of transformation -- it > actually can't do much of anything. > Javac and the JIT both know that log is a pure function. They also definit

Intermittant Test Failure: testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)

2010-04-29 Thread Jeff Eastman
The surfire report seems to indicate this might be a timing problem with hdfs being lazy. Sometimes it passes and sometimes it fails, but of course, right at the end of the 15 min core tests which makes it especially annoying. Any resolution possible? --

Re: Negative LLR Score

2010-04-29 Thread Sean Owen
You mean "sum * Math.log(sum)"? That's nice, I'll go with that. javac definitely isn't allowed to do that kind of transformation -- it actually can't do much of anything. ProGuard might -- it's actually a dynamite byte code optimizer and I've been itching to get it re-integrated into the build for

Re: Negative LLR Score

2010-04-29 Thread Ted Dunning
I think a cap is a good thing if the error is relatively small ( < 1e-4 or so). Betraying my age, I usually rewrite this as this: for (int element : elements) { if (element > 0) { result += element * (Math.log(element)); } } result -= elements.size() * Math.log(sum) But

Re: Negative LLR Score

2010-04-29 Thread Sean Owen
FWIW I had rewritten the entropy() loop to be: for (int element : elements) { if (element > 0) { result += element * (Math.log(element / sum)); } } and then further to double logSum = Math.log(sum); for (int element : elements) { if (element > 0) {

Re: Negative LLR Score

2010-04-29 Thread Sean Owen
Ah yeah that's it. So... is the better change to cap the result of logLikelihoodRatio() at 0.0? On Thu, Apr 29, 2010 at 5:11 PM, Ted Dunning wrote: > I suspect round-off error.  In R I get this for the raw LLR: > >> llr(matrix(c(6,7567, 1924, 2426487), nrow=2)) > [1] 3.380607e-11 > > A slightly

Re: Negative LLR Score

2010-04-29 Thread Ted Dunning
I suspect round-off error. In R I get this for the raw LLR: > llr(matrix(c(6,7567, 1924, 2426487), nrow=2)) [1] 3.380607e-11 A slightly different implementation might well have gotten a small negative number here. On Thu, Apr 29, 2010 at 8:56 AM, Sean Owen wrote: > What about Shashikant's exa

Re: Negative LLR Score

2010-04-29 Thread Sean Owen
What about Shashikant's example? Unless my brain's not in gear, that seems like a legit example, but does indeed product a negative LLR.

Re: Negative LLR Score

2010-04-29 Thread Drew Farris
I was under the (perhaps incorrect) impression that negative LLR indicates some sort of problem with the input. On Thu, Apr 29, 2010 at 7:44 AM, Shashikant Kore wrote: > Root LLR calculation has a minor bug. When LLR score is negative, > square root is undefined. You can see the result for the fo

Re: Negative LLR Score

2010-04-29 Thread Sean Owen
(I can easily make the fix and add a test, but is the right thing to return 0, or instead proceed in the method with the value -sqrt(-llr) when llr is negative?) On Thu, Apr 29, 2010 at 12:44 PM, Shashikant Kore wrote: > Root LLR calculation has a minor bug. When LLR score is negative, > square r

Negative LLR Score

2010-04-29 Thread Shashikant Kore
Root LLR calculation has a minor bug. When LLR score is negative, square root is undefined. You can see the result for the following to be NaN. org.apache.mahout.math.stats.LogLikelihood.rootLogLikelihoodRatio(6, 7567, 1924, 2426487) A minor fix would be to return zero if LLR is less than zero as

Re: Similarity Tests Failing since 939074?

2010-04-29 Thread Sean Owen
Sorry that's essentially an elaborate typo, which made something that Can't Possibly Change Behavior, Change Behavior. On Thu, Apr 29, 2010 at 4:12 AM, Jeff Eastman wrote: > Failed tests: >  testSimple(org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest) >  testSimpleItem(