[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-19 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836053#action_12836053 ] Ted Dunning commented on MAHOUT-300: I think that the min and max functions need to che

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836163#action_12836163 ] Sean Owen commented on MAHOUT-300: -- Tiny stuff -- in things like dotSelf(), you don't need

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836167#action_12836167 ] Robin Anil commented on MAHOUT-300: --- I removed hasNoElements check as per sean's and teds

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836169#action_12836169 ] Robin Anil commented on MAHOUT-300: --- An issue i found here was for empty dense vectors I

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836238#action_12836238 ] Ted Dunning commented on MAHOUT-300: {quote} I dont know what to do in the edge case of

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836245#action_12836245 ] Robin Anil commented on MAHOUT-300: --- bq. It may be that someday we will need maxNonZero,

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836249#action_12836249 ] Ted Dunning commented on MAHOUT-300: {quote} ted:It may be that someday we will nee

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836370#action_12836370 ] Robin Anil commented on MAHOUT-300: --- ok. Made maxValue and maxValueIndex as per your comm

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836385#action_12836385 ] Jake Mannix commented on MAHOUT-300: This output is on the Reuters collection again, or

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836411#action_12836411 ] Robin Anil commented on MAHOUT-300: --- Its on the artificial VectorBenchmarks. On reuters,

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836412#action_12836412 ] Robin Anil commented on MAHOUT-300: --- Also please review this and confirm its fit to commi

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836433#action_12836433 ] Ted Dunning commented on MAHOUT-300: I think that this is a cleaner style for the merge

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836449#action_12836449 ] Jake Mannix commented on MAHOUT-300: Running this on my laptop, with numNonzeroElements

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836452#action_12836452 ] Ted Dunning commented on MAHOUT-300: Huh some of those times are a little surprisin

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836456#action_12836456 ] Jake Mannix commented on MAHOUT-300: Well, I've got Robin's most recent changes in ther

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836457#action_12836457 ] Jake Mannix commented on MAHOUT-300: Interestingly, for SquaredEuclideanDistanceMeasure

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836460#action_12836460 ] Jake Mannix commented on MAHOUT-300: Another run, even more sparse: cardinality: 500,00

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836466#action_12836466 ] Jake Mannix commented on MAHOUT-300: I'm a little concerned about correctness though:

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-21 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836468#action_12836468 ] Jake Mannix commented on MAHOUT-300: Ok, went away... probably a case of pebkac {code

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836598#action_12836598 ] Robin Anil commented on MAHOUT-300: --- We should be multiplying using sparsity instead of c

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836624#action_12836624 ] Robin Anil commented on MAHOUT-300: --- I think the irregularity is due to the sparse vector

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836630#action_12836630 ] Robin Anil commented on MAHOUT-300: --- Ted, your loop structure seem to be slower by about

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836633#action_12836633 ] Sean Owen commented on MAHOUT-300: -- Tiny comment -- will probably be wise to use BitSet ra

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836647#action_12836647 ] Robin Anil commented on MAHOUT-300: --- {code} public double dot(Vector x) { if (size(

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836649#action_12836649 ] Robin Anil commented on MAHOUT-300: --- On dense data 1000, 1000 {noformat} BenchMarks

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836679#action_12836679 ] Robin Anil commented on MAHOUT-300: --- i found the anomaly Jake was talking about. It was d

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836706#action_12836706 ] Jake Mannix commented on MAHOUT-300: The sparse data is odd... (-vs 50 -sp 5000) (r

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836713#action_12836713 ] Robin Anil commented on MAHOUT-300: --- Can i commit the latest. If you dont have any change

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836815#action_12836815 ] Jake Mannix commented on MAHOUT-300: With these opts: -vs 50 -sp 500 -nv 50 -l 500

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836817#action_12836817 ] Ted Dunning commented on MAHOUT-300: These are getting respectable! As a quick hack,

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836818#action_12836818 ] Jake Mannix commented on MAHOUT-300: agreed, Ted. I'm liking that we're getting 60-7

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836819#action_12836819 ] Robin Anil commented on MAHOUT-300: --- Seq.rand and rand.seq shoudl get the same perf level

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836826#action_12836826 ] Jake Mannix commented on MAHOUT-300: and now that my run (of three comments ago) is fin

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836839#action_12836839 ] Robin Anil commented on MAHOUT-300: --- {noformat} seq.seq= 46,855 rand.seq = 37,397 s

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836848#action_12836848 ] Robin Anil commented on MAHOUT-300: --- {noformat} rand.rand = 14,435 dense.rand = 9,172 ra

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-22 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836909#action_12836909 ] Jake Mannix commented on MAHOUT-300: New benchmark additions: {code}INFO: BenchMarks