RowSimilarityJob, sampleDown method problem

2013-08-13 Thread sam wu
Mahout 0.9 snapshot RowSimilarityJob.java , sampleDown method line 291 or 300 double rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) / observationsPerRow; return either 0.0 or 1.0, not fraction. needs (double) casting BR Sam

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread Ted Dunning
Why do you think this? On Tue, Aug 13, 2013 at 11:56 AM, sam wu swu5...@gmail.com wrote: Mahout 0.9 snapshot RowSimilarityJob.java , sampleDown method line 291 or 300 double rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) / observationsPerRow; return either 0.0

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread sam wu
say column a has 1000 entries, maxPref=700 rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) / observationsPerRow; we get rowSampleRate =0.0 ( not 0.7) do we totally skip this column or sample column entries with .7 probalility (roughly get 700 entries) On Tue, Aug 13, 2013

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread Ted Dunning
Ouch. Sorry... your original posting made it sound like you *wanted* it to be 0.0 or 1.0. This is a bug. Can you file a JIRA? On Tue, Aug 13, 2013 at 12:04 PM, sam wu swu5...@gmail.com wrote: say column a has 1000 entries, maxPref=700 rowSampleRate = Math.min(maxObservationsPerRow,

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread sam wu
Sorry for the phrasing. I'll file a JIRA Sam On Tue, Aug 13, 2013 at 12:10 PM, Ted Dunning ted.dunn...@gmail.com wrote: Ouch. Sorry... your original posting made it sound like you *wanted* it to be 0.0 or 1.0. This is a bug. Can you file a JIRA? On Tue, Aug 13, 2013 at 12:04 PM,

Re: RowSimilarityJob, sampleDown method problem

2013-08-13 Thread Stevo Slavić
Findbugs was reporting it whole time (see Warnings tab on https://builds.apache.org/job/Mahout-Quality/2194/findbugsResult/ and ICAST_IDIV_CAST_TO_DOUBLE bug). We should get findbugs to 0. On Tue, Aug 13, 2013 at 9:13 PM, sam wu swu5...@gmail.com wrote: Sorry for the phrasing. I'll file a