[GitHub] spark pull request: [SPARK-7055][SQL]Use correct ClassLoader for J...

2015-04-22 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/5633#issuecomment-95428960 Do the MySQL and Postgres integration tests both pass with this change? It's not a security issue per se. The trouble is that JDBC's security st

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-04-08 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-90963126 OK. I haven't made a serious attempt to write a solver for general L1-constrained least squares problems. I don't see anything wrong with impl

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-04-08 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-90947409 Not at home right now, so I don't have everything in front of me. If you have a "projection onto tangent cone" operator and you keep explicit track of

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-03-19 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-83743835 @dlwh: Intermediate states do not matter in ANNLS. In ANNLS, we allow ourselves to do a crappy job solving the least squares problems at each iteration because the

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-03-18 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-83239328 The console output on the link provided says, among other things, this: Could not find Apache license headers in the following files: !? /home

[GitHub] spark pull request: Avoid deprecation warnings in JDBCSuite.

2015-02-17 Thread tmyklebu
GitHub user tmyklebu opened a pull request: https://github.com/apache/spark/pull/4668 Avoid deprecation warnings in JDBCSuite. This pull request replaces calls to deprecated methods from `java.util.Date` with near-equivalents in `java.util.Calendar`. You can merge this pull

[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

2015-02-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/4261#discussion_r23904561 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala --- @@ -0,0 +1,417 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

2015-02-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/4261#discussion_r23904154 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala --- @@ -0,0 +1,417 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

2015-02-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/4261#discussion_r23901190 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala --- @@ -0,0 +1,417 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

2015-01-29 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/4261#issuecomment-72037926 I don't think these test failures are my fault, unless I need to handle SparkContext lifetimes differently . One thing that I see in the test failure log is

[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

2015-01-29 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/4261#discussion_r23772508 --- Diff: project/SparkBuild.scala --- @@ -397,6 +397,11 @@ object TestSettings { testOptions += Tests.Argument(TestFrameworks.JUnit, "-v&

[GitHub] spark pull request: A JDBC driver for Spark SQL.

2015-01-28 Thread tmyklebu
GitHub user tmyklebu opened a pull request: https://github.com/apache/spark/pull/4261 A JDBC driver for Spark SQL. This pull request contains a Spark SQL data source that can pull data from, and can put data into, a JDBC database. I have tested both read and write support

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/493#issuecomment-41982396 Point being that Jenkins tests have failed repeatedly, apparently for reasons that have nothing to do with this change. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/493#issuecomment-41973860 StreamingContextSuite's "stop gracefully" test failed here. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: [SPARK-1672][WIP] Separate partitioning in ALS

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/593#discussion_r12214297 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -381,14 +399,15 @@ class ALS private ( * the users (or (blockId

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/493#discussion_r12213255 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -708,6 +709,113 @@ object ALS { trainImplicit(ratings, rank

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/493#discussion_r12213102 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -708,6 +709,113 @@ object ALS { trainImplicit(ratings, rank

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/493#issuecomment-41946542 One of these is sitting in the console output: 10:35:57.717 WARN org.eclipse.jetty.util.component.AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040

[GitHub] spark pull request: [SPARK-1672][WIP] Separate partitioning in ALS

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/593#discussion_r12195324 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -381,14 +399,15 @@ class ALS private ( * the users (or (blockId

[GitHub] spark pull request: [SPARK-1672][WIP] Separate partitioning in ALS

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/593#discussion_r12195265 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -163,14 +165,27 @@ class ALS private ( def run(ratings: RDD

[GitHub] spark pull request: [SPARK-1672][WIP] Separate partitioning in ALS

2014-05-01 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/593#discussion_r12195151 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -163,14 +165,27 @@ class ALS private ( def run(ratings: RDD

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-04-30 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/493#issuecomment-41883891 Something wrong with Jenkins? Looks like it hit some sort of OOM condition? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-04-30 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/493#issuecomment-41877934 Note that the new code doesn't actually work if you supply user and product partitioners that have different numbers of partitions. However, it can be straightforw

[GitHub] spark pull request: [SPARK-1672][WIP] Separate partitioning in ALS

2014-04-29 Thread tmyklebu
GitHub user tmyklebu opened a pull request: https://github.com/apache/spark/pull/593 [SPARK-1672][WIP] Separate partitioning in ALS This is a work-in-progress. At present, the numbers of user and product blocks are not exposed to the user and no tests are present. As such, this

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-04-28 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/493#discussion_r12081159 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -708,6 +709,123 @@ object ALS { trainImplicit(ratings, rank

[GitHub] spark pull request: Micro-optimisation of ALS

2014-04-26 Thread tmyklebu
GitHub user tmyklebu opened a pull request: https://github.com/apache/spark/pull/568 Micro-optimisation of ALS This change replaces some Scala `for` and `foreach` constructs with `while` constructs. There may be a slight performance gain on the order of 1-2% when training an ALS

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-25 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/460#issuecomment-41453017 Same output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-04-25 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/493#discussion_r12019270 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -17,7 +17,7 @@ package org.apache.spark.mllib.recommendation

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-25 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/460#issuecomment-41439353 "Build was aborted." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-25 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/460#issuecomment-41422324 Manually aborted, it seems? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-25 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/460#discussion_r12005233 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-25 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/460#discussion_r12005168 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-25 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/460#discussion_r12005355 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -537,6 +566,34 @@ object ALS { * in the form of (userID

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-25 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/460#discussion_r12005322 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-04-23 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/493#discussion_r11919464 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -708,6 +708,59 @@ object ALS { trainImplicit(ratings, rank

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-04-22 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/493#issuecomment-41118446 I also want to start a conversation here on how we want to test this. Certainly I should add test cases here, but I'm unsure of what exactly to test here.

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-04-22 Thread tmyklebu
GitHub user tmyklebu opened a pull request: https://github.com/apache/spark/pull/493 [SPARK-1580] Estimate ALS communication and computation costs. This pull request adds a function `evaluatePartitioner` to the `ALS` object. This function takes a dataset, a rank, and user and

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-21 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/460#discussion_r11813502 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1553] Alternating nonnegative least-squ...

2014-04-21 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/460#discussion_r11813421 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Alternating nonnegative least-squares

2014-04-20 Thread tmyklebu
GitHub user tmyklebu opened a pull request: https://github.com/apache/spark/pull/460 Alternating nonnegative least-squares This pull request includes a nonnegative least-squares solver (NNLS) tailored to the kinds of small-scale problems that come up when training matrix

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-20 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/407#issuecomment-40910803 @markhamstra The NNLS-related ones. I think I made them disappear, though, by git push --force'ing an old copy of my master branch (that happened to contain exactl

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-20 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/407#issuecomment-40909034 Ugh. Why are those commits part of this PR now? How do I undo this? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-19 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/407#issuecomment-40883661 Handled most of your comments. I didn't change the assert in the "negative ids" test and I left the partitioner null handling as is. I don't see a

[GitHub] spark pull request: [SPARK-1535] ALS: Avoid the garbage-creating c...

2014-04-19 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/442#issuecomment-40879543 @srowen: Does Hotspot actually generate code for the allocation and the dead store with the bad ctor? I haven't picked through it yet. --- If your project is s

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-19 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11795406 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala --- @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-19 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11795347 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -114,6 +116,14 @@ class ALS private ( this

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-19 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11795344 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala --- @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-19 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11795330 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -114,6 +116,14 @@ class ALS private ( this

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

2014-04-18 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/442#issuecomment-40826930 [SPARK-1535] describes the issue and the form of the fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

2014-04-17 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/442#issuecomment-40787028 This appears to be a PySpark error unrelated to my change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

2014-04-17 Thread tmyklebu
GitHub user tmyklebu opened a pull request: https://github.com/apache/spark/pull/442 ALS: Avoid the garbage-creating ctor of DoubleMatrix `new DoubleMatrix(double[])` creates a garbage `double[]` of the same length as its argument and immediately throws it away. This pull request

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11653978 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -167,11 +169,24 @@ class ALS private ( this.numBlocks

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11631405 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -96,6 +97,7 @@ class ALS private ( private var lambda: Double

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-15 Thread tmyklebu
Github user tmyklebu commented on a diff in the pull request: https://github.com/apache/spark/pull/407#discussion_r11631197 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -167,11 +169,24 @@ class ALS private ( this.numBlocks

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread tmyklebu
Github user tmyklebu commented on the pull request: https://github.com/apache/spark/pull/407#issuecomment-40425865 Build failure. Looks like a config issue in Jenkins? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread tmyklebu
GitHub user tmyklebu opened a pull request: https://github.com/apache/spark/pull/407 [SPARK-1281] Improve partitioning in ALS ALS was using HashPartitioner and explicit uses of `%` together. Further, the naked use of `%` meant that, if the number of partitions corresponded with