Mahout 0.9 snapshot
RowSimilarityJob.java , sampleDown method
line 291 or 300
double rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow)
/ observationsPerRow;
return either 0.0 or 1.0, not fraction. needs (double) casting
BR
Sam
Why do you think this?
On Tue, Aug 13, 2013 at 11:56 AM, sam wu swu5...@gmail.com wrote:
Mahout 0.9 snapshot
RowSimilarityJob.java , sampleDown method
line 291 or 300
double rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow)
/ observationsPerRow;
return either 0.0
say column a has 1000 entries, maxPref=700
rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) /
observationsPerRow;
we get rowSampleRate =0.0 ( not 0.7)
do we totally skip this column or sample column entries with .7 probalility
(roughly get 700 entries)
On Tue, Aug 13, 2013
Ouch.
Sorry... your original posting made it sound like you *wanted* it to be 0.0
or 1.0.
This is a bug. Can you file a JIRA?
On Tue, Aug 13, 2013 at 12:04 PM, sam wu swu5...@gmail.com wrote:
say column a has 1000 entries, maxPref=700
rowSampleRate = Math.min(maxObservationsPerRow,
Sorry for the phrasing.
I'll file a JIRA
Sam
On Tue, Aug 13, 2013 at 12:10 PM, Ted Dunning ted.dunn...@gmail.com wrote:
Ouch.
Sorry... your original posting made it sound like you *wanted* it to be 0.0
or 1.0.
This is a bug. Can you file a JIRA?
On Tue, Aug 13, 2013 at 12:04 PM,
Findbugs was reporting it whole time (see Warnings tab on
https://builds.apache.org/job/Mahout-Quality/2194/findbugsResult/ and
ICAST_IDIV_CAST_TO_DOUBLE
bug).
We should get findbugs to 0.
On Tue, Aug 13, 2013 at 9:13 PM, sam wu swu5...@gmail.com wrote:
Sorry for the phrasing.
I'll file a