[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-24 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35906687 @fommil fine by me. I'll get on it. On Feb 24, 2014 4:07 AM, "Sam Halliday" wrote: > Hi all, > > The discussions with

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-23 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35842233 @srowen @fommil Breeze is flexible enough that we can swap out different back ends quickly (and let users decide at runtime). So if need be, I can do the work to

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-18 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35450646 @mengxr thanks for doing all this! It's nice to see that the overhead in Breeze is largely negligible as compared to MTJ (and maybe even slightly b

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-16 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35220185 @martinjaggi I've often found that minibatching makes things converge much more quickly, since you get a nice variance reduction in the estimate of

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-16 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35218872 @martinjaggi For how it's usually implemented, that's right. But you can quite likely get better performance doing minibatches with dense vector/CSC

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-14 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35105330 Just to follow up on Breeze performance: in the latest snapshot, we are consistently faster than JBlas and Mahout in @mengxr's benchmarks. O

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-13 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35021161 @fommil :-) Sorry to undersell. Breeze also has CSCMatrix support, but that's not entirely finished. On Thu, Feb 13, 2014 at

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-13 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35020275 I can cut a release this weekend. We wrap @fommil's netlib-java ( https://github.com/fommil/netlib-java), whose performance tracks with C pretty well.