Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/5633#issuecomment-95428960
Do the MySQL and Postgres integration tests both pass with this change?
It's not a security issue per se. The trouble is that JDBC's security
st
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/5005#issuecomment-90963126
OK. I haven't made a serious attempt to write a solver for general
L1-constrained least squares problems. I don't see anything wrong with
impl
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/5005#issuecomment-90947409
Not at home right now, so I don't have everything in front of me. If you
have a "projection onto tangent cone" operator and you keep explicit track of
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/5005#issuecomment-83743835
@dlwh: Intermediate states do not matter in ANNLS. In ANNLS, we allow
ourselves to do a crappy job solving the least squares problems at each
iteration because the
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/5005#issuecomment-83239328
The console output on the link provided says, among other things, this:
Could not find Apache license headers in the following files:
!?
/home
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/4668
Avoid deprecation warnings in JDBCSuite.
This pull request replaces calls to deprecated methods from
`java.util.Date` with near-equivalents in `java.util.Calendar`.
You can merge this pull
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/4261#discussion_r23904561
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
---
@@ -0,0 +1,417 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/4261#discussion_r23904154
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
---
@@ -0,0 +1,417 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/4261#discussion_r23901190
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
---
@@ -0,0 +1,417 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/4261#issuecomment-72037926
I don't think these test failures are my fault, unless I need to handle
SparkContext lifetimes differently . One thing that I see in the test failure
log is
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/4261#discussion_r23772508
--- Diff: project/SparkBuild.scala ---
@@ -397,6 +397,11 @@ object TestSettings {
testOptions += Tests.Argument(TestFrameworks.JUnit, "-v&
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/4261
A JDBC driver for Spark SQL.
This pull request contains a Spark SQL data source that can pull data from,
and can put data into, a JDBC database.
I have tested both read and write support
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/493#issuecomment-41982396
Point being that Jenkins tests have failed repeatedly, apparently for
reasons that have nothing to do with this change.
---
If your project is set up for it, you can
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/493#issuecomment-41973860
StreamingContextSuite's "stop gracefully" test failed here.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/593#discussion_r12214297
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -381,14 +399,15 @@ class ALS private (
* the users (or (blockId
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/493#discussion_r12213255
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -708,6 +709,113 @@ object ALS {
trainImplicit(ratings, rank
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/493#discussion_r12213102
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -708,6 +709,113 @@ object ALS {
trainImplicit(ratings, rank
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/493#issuecomment-41946542
One of these is sitting in the console output:
10:35:57.717 WARN org.eclipse.jetty.util.component.AbstractLifeCycle:
FAILED SelectChannelConnector@0.0.0.0:4040
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/593#discussion_r12195324
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -381,14 +399,15 @@ class ALS private (
* the users (or (blockId
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/593#discussion_r12195265
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -163,14 +165,27 @@ class ALS private (
def run(ratings: RDD
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/593#discussion_r12195151
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -163,14 +165,27 @@ class ALS private (
def run(ratings: RDD
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/493#issuecomment-41883891
Something wrong with Jenkins? Looks like it hit some sort of OOM condition?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/493#issuecomment-41877934
Note that the new code doesn't actually work if you supply user and product
partitioners that have different numbers of partitions. However, it can be
straightforw
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/593
[SPARK-1672][WIP] Separate partitioning in ALS
This is a work-in-progress. At present, the numbers of user and product
blocks are not exposed to the user and no tests are present. As such, this
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/493#discussion_r12081159
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -708,6 +709,123 @@ object ALS {
trainImplicit(ratings, rank
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/568
Micro-optimisation of ALS
This change replaces some Scala `for` and `foreach` constructs with `while`
constructs. There may be a slight performance gain on the order of 1-2% when
training an ALS
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/460#issuecomment-41453017
Same output.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/493#discussion_r12019270
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -17,7 +17,7 @@
package org.apache.spark.mllib.recommendation
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/460#issuecomment-41439353
"Build was aborted."
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/460#issuecomment-41422324
Manually aborted, it seems?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/460#discussion_r12005233
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/460#discussion_r12005168
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/460#discussion_r12005355
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -537,6 +566,34 @@ object ALS {
* in the form of (userID
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/460#discussion_r12005322
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/493#discussion_r11919464
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -708,6 +708,59 @@ object ALS {
trainImplicit(ratings, rank
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/493#issuecomment-41118446
I also want to start a conversation here on how we want to test this.
Certainly I should add test cases here, but I'm unsure of what exactly to test
here.
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/493
[SPARK-1580] Estimate ALS communication and computation costs.
This pull request adds a function `evaluatePartitioner` to the `ALS`
object. This function takes a dataset, a rank, and user and
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/460#discussion_r11813502
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/460#discussion_r11813421
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLSbyPCG.scala ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/460
Alternating nonnegative least-squares
This pull request includes a nonnegative least-squares solver (NNLS)
tailored to the kinds of small-scale problems that come up when training matrix
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40910803
@markhamstra The NNLS-related ones. I think I made them disappear, though,
by git push --force'ing an old copy of my master branch (that happened to
contain exactl
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40909034
Ugh. Why are those commits part of this PR now? How do I undo this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40883661
Handled most of your comments. I didn't change the assert in the "negative
ids" test and I left the partitioner null handling as is. I don't see a
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/442#issuecomment-40879543
@srowen: Does Hotspot actually generate code for the allocation and the
dead store with the bad ctor? I haven't picked through it yet.
---
If your project is s
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795406
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795347
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -114,6 +116,14 @@ class ALS private (
this
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795344
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795330
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -114,6 +116,14 @@ class ALS private (
this
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/442#issuecomment-40826930
[SPARK-1535] describes the issue and the form of the fix.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/442#issuecomment-40787028
This appears to be a PySpark error unrelated to my change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/442
ALS: Avoid the garbage-creating ctor of DoubleMatrix
`new DoubleMatrix(double[])` creates a garbage `double[]` of the same
length as its argument and immediately throws it away. This pull request
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11653978
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
this.numBlocks
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11631405
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -96,6 +97,7 @@ class ALS private (
private var lambda: Double
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11631197
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
this.numBlocks
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40425865
Build failure. Looks like a config issue in Jenkins?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/407
[SPARK-1281] Improve partitioning in ALS
ALS was using HashPartitioner and explicit uses of `%` together. Further,
the naked use of `%` meant that, if the number of partitions corresponded with
56 matches
Mail list logo