[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-03 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/22320 (This is a test comment to test a GitHub Integration; please ignore) --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21559: [SPARK-24525][SS] Provide an option to limit number of r...

2018-06-13 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21559 jenkins add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-10 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21481 Let's merge this as-is and do the build improvements in a separate PR. That's important because we may want to backport the overflow fix to maintenance branches and may want to do so i

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-06-02 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r192566116 --- Diff: common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java --- @@ -141,26 +141,14 @@ public void fetchChunk

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-06-02 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r192565980 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/RpcHandler.java --- @@ -38,15 +38,24 @@ * * This method

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-06-02 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r192565530 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/UploadStream.java --- @@ -0,0 +1,107 @@ +/* + * Licensed to the

[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-02 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21481 Hey @kiszk, thanks for tracking this down. This change looks good to me. I have a couple of questions, mostly aimed towards figuring out how we can categorically solve this problem

[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-06-02 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21390 Feel free to do the TTL in a followup. My feeling is that it won't be super useful in practice, though: 1. Cleanup of non-shuffle disk block manager files following executor exit

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21346 Summary of key changes (WIP; notes to self): > Summary of changes: > > - Introduce a new `UploadStream` RPC which is sent to push a large payload as a stream (in

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191964329 --- Diff: project/MimaExcludes.scala --- @@ -36,6 +36,9 @@ object MimaExcludes { // Exclude rules for 2.4.x lazy val v24excludes

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191941962 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/StreamData.java --- @@ -0,0 +1,96 @@ +/* + * Licensed to the

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191941503 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/RpcHandler.java --- @@ -38,15 +38,24 @@ * * This method

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191940304 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/UploadStream.java --- @@ -0,0 +1,107 @@ +/* + * Licensed to the

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191939431 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/UploadStream.java --- @@ -0,0 +1,107 @@ +/* + * Licensed to the

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191938203 --- Diff: common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java --- @@ -141,26 +141,14 @@ public void fetchChunk

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191935821 --- Diff: common/network-common/src/main/java/org/apache/spark/network/client/StreamInterceptor.java --- @@ -50,16 +52,22 @@ @Override

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21346 I'm going to be starting a more detailed review pass on this now and will be getting caught back up with the discussion that's happened so far. One high-level point I'd

[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191881226 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """

[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191877900 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """

[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-25 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21390 Yeah, this is only concerned with non-shuffle files which are located in the block manager temp directories (e.g. large sorter spill files). There is a related issue where shuffle files

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-24 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r190705466 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -97,6 +99,10 @@ private[deploy] class Worker( private val

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190438434 --- Diff: python/pyspark/shuffle.py --- @@ -67,6 +67,19 @@ def get_used_memory(): return 0 +def safe_iter(f

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-23 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21342 Updated changes LGTM. Thanks for working on this! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r190105740 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -97,6 +97,10 @@ private[deploy] class Worker( private val

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r190104118 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -732,6 +736,9 @@ private[deploy] class Worker

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r189967162 --- Diff: python/pyspark/shuffle.py --- @@ -67,6 +67,19 @@ def get_used_memory(): return 0 +def safe_iter(f

[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21311 @cxzl25, to clarify: > Some data is lost and the data read out is dirty To clarify, is this a potential cause of a wrong-answer correctness bug? If so, we should be sure

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21383 jenkins this is ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r189818845 --- Diff: python/pyspark/shuffle.py --- @@ -67,6 +67,19 @@ def get_used_memory(): return 0 +def safe_iter(f

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r189816085 --- Diff: common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/NonShuffleFilesCleanupSuite.java --- @@ -0,0 +1,222

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r189813626 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -97,6 +97,10 @@ private[deploy] class Worker( private val

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r189813156 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -732,6 +736,9 @@ private[deploy] class Worker

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r189809797 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -97,6 +97,10 @@ private[deploy] class Worker( private val

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r189809055 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java --- @@ -226,6 +246,29 @@ private void

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r189808705 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java --- @@ -211,6 +211,26 @@ public void

[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-22 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21390 Context for other reviewers: the issue addressed by this patch is actually a real issue in practice, especially for long-lived Spark clusters; I have seen this specific problem play a large

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-21 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21342 Thanks for the updates. The net change / scope of changes have been significantly reduced here, so I feel that this change is a lot less risky now. I left only one nitpicky comment at

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189754203 --- Diff: core/src/main/scala/org/apache/spark/util/SparkFatalException.scala --- @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189754010 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -111,12 +112,18 @@ case class

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189753564 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -111,12 +112,18 @@ case class

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189753488 --- Diff: core/src/main/scala/org/apache/spark/util/SparkFatalException.scala --- @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #21329: [SPARK-24277][SQL] Code clean up in SQL module: HadoopMa...

2018-05-18 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21329 I'd also like to note that commit protocols have historically been a very high risk area of the code, so I think we should have a much higher bar for explaining changes to that comp

[GitHub] spark issue #21329: [SPARK-24277][SQL] Code clean up in SQL module: HadoopMa...

2018-05-18 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21329 In general, I'm very wary of cleanup changes like this: unless we have a _need_ to do this (i.e. it causes negative side effects, breaks workloads, prevents specific concrete improvements

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-16 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21342 I'm also in favor of delaying for a couple of days for more detailed review because historically I think these types of changes have been high risk. The risk calculus might be a bit differe

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-16 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21346 It's been a little while since I've thought about this issue, so I have a few clarifying questions to help me understand the high-level changes: 1. I recall that the problem

[GitHub] spark issue #21327: [SPARK-24107][CORE][followup] ChunkedByteBuffer.writeFul...

2018-05-16 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21327 LGTM. Thanks for these changes; they really help to clarify this tricky piece of code for readers. --- - To unsubscribe, e

[GitHub] spark issue #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully method ...

2018-05-14 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21175 No, I mean that the code here can simply follow the write call as straight through code. We don't need to guard against exceptions here because the duplicate of the buffer is used only

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-05-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21175#discussion_r188154900 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -63,10 +63,15 @@ private[spark] class ChunkedByteBuffer(var chunks

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-05-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21175#discussion_r188154716 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -63,10 +63,15 @@ private[spark] class ChunkedByteBuffer(var chunks

[GitHub] spark pull request #21212: [SPARK-24143] filter empty blocks when convert ma...

2018-05-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21212#discussion_r186315362 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -267,28 +269,30 @@ final class

[GitHub] spark pull request #18801: SPARK-10878 Fix race condition when multiple clie...

2018-05-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/18801#discussion_r186160092 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala --- @@ -255,4 +256,20 @@ class SparkSubmitUtilsSuite extends

[GitHub] spark pull request #21219: [SPARK-24160] ShuffleBlockFetcherIterator should ...

2018-05-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21219#discussion_r185949405 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -407,6 +407,25 @@ final class

[GitHub] spark issue #21219: [SPARK-24160] ShuffleBlockFetcherIterator should fail if...

2018-05-03 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21219 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21219: [SPARK-24160] ShuffleBlockFetcherIterator should fail if...

2018-05-02 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21219 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21219: [SPARK-24160] ShuffleBlockFetcherIterator should ...

2018-05-02 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/21219 [SPARK-24160] ShuffleBlockFetcherIterator should fail if it receives zero-size blocks ## What changes were proposed in this pull request? This patch modifies

[GitHub] spark pull request #21101: [SPARK-23989][SQL] exchange should copy data befo...

2018-04-18 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21101#discussion_r182609675 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -167,22 +164,24 @@ object ShuffleExchangeExec

[GitHub] spark pull request #20310: revert [SPARK-10030] Use tags to control which te...

2018-01-17 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20310#discussion_r162268244 --- Diff: common/tags/src/test/java/org/apache/spark/tags/DockerTest.java --- @@ -1,26 +0,0 @@ -/* - * Licensed to the Apache Software Foundation

[GitHub] spark issue #20310: revert [SPARK-10030] Use tags to control which tests to ...

2018-01-17 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/20310 Are you sure that we want to blanket revert this entire patch? Is there a more surgical short-term fix we can make in `dev/sparktestsupport/modules.py` to just always unconditionally enable the

[GitHub] spark pull request #20264: [SPARK-23070] Bump previousSparkVersion in MimaBu...

2018-01-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20264#discussion_r161418395 --- Diff: project/MimaBuild.scala --- @@ -88,7 +88,7 @@ object MimaBuild { def mimaSettings(sparkHome: File, projectRef: ProjectRef

[GitHub] spark pull request #20222: [SPARK-23028] Bump master branch version to 2.4.0...

2018-01-10 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20222#discussion_r160741170 --- Diff: project/MimaExcludes.scala --- @@ -34,6 +34,10 @@ import com.typesafe.tools.mima.core.ProblemFilters._ */ object MimaExcludes

[GitHub] spark pull request #20222: [SPARK-23028] Bump master branch version to 2.4.0...

2018-01-10 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20222#discussion_r160740904 --- Diff: project/MimaExcludes.scala --- @@ -34,6 +34,10 @@ import com.typesafe.tools.mima.core.ProblemFilters._ */ object MimaExcludes

[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-10 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/20222 We should already be set up for 2.3.x builds in AMPLab Jenkins. For example: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.3-test-maven-hadoop-2.6

[GitHub] spark issue #20191: [SPARK-22997] Add additional defenses against use of fre...

2018-01-10 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/20191 I'm merging this into master and branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For addit

[GitHub] spark issue #20191: [SPARK-22997] Add additional defenses against use of fre...

2018-01-09 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/20191 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160547545 --- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala --- @@ -376,18 +374,13 @@ private[netty] class NettyRpcEnv( def

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160489460 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -196,11 +196,24 @@ private[spark] class

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160481843 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -196,11 +196,24 @@ private[spark] class

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160314055 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -196,11 +196,24 @@ private[spark] class

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160313840 --- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala --- @@ -376,18 +374,13 @@ private[netty] class NettyRpcEnv( def

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160312699 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -196,11 +196,24 @@ private[spark] class

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160312383 --- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala --- @@ -332,15 +332,17 @@ private[netty] class NettyRpcEnv( val

[GitHub] spark issue #20191: [SPARK-22997] Add additional defenses against use of fre...

2018-01-08 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/20191 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20191: [SPARK-22997] Add additional defenses against use...

2018-01-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20191#discussion_r160297024 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java --- @@ -38,9 +38,20 @@ public MemoryBlock allocate(long

[GitHub] spark pull request #20191: [SPARK-22997] Add additional defenses against use...

2018-01-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/20191#discussion_r160296722 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java --- @@ -38,9 +38,20 @@ public MemoryBlock allocate(long

[GitHub] spark pull request #20191: [SPARK-22997] Add additional defenses against use...

2018-01-08 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/20191 [SPARK-22997] Add additional defenses against use of freed MemoryBlocks ## What changes were proposed in this pull request? This patch modifies Spark's `MemoryAllocator` implementa

[GitHub] spark pull request #20182: [SPARK-22985] Fix argument escaping bug in from_u...

2018-01-07 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/20182 [SPARK-22985] Fix argument escaping bug in from_utc_timestamp / to_utc_timestamp codegen ## What changes were proposed in this pull request? This patch adds additional escaping in

[GitHub] spark pull request #20181: [SPARK-22984] Fix incorrect bitmap copying and of...

2018-01-07 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/20181 [SPARK-22984] Fix incorrect bitmap copying and offset adjustment in GenerateUnsafeRowJoiner ## What changes were proposed in this pull request? This PR fixes a longstanding correctness

[GitHub] spark pull request #20180: [SPARK-22983] Don't push filters beneath aggregat...

2018-01-07 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/20180 [SPARK-22983] Don't push filters beneath aggregates with empty grouping expressions ## What changes were proposed in this pull request? The following SQL query should return zero

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-07 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/20179 [SPARK-22982] Remove unsafe asynchronous close() call from FileDownloadChannel ## What changes were proposed in this pull request? This patch fixes a severe asynchronous IO bug in

[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...

2017-12-13 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/19971 Wait, nevermind: I see that this isn't actually running the queries, so setting `spark.sql.codegen.fallback=false` wouldn't be sufficient. Separate from this PR, we might w

[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...

2017-12-13 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/19971 Could we address this more broadly by ensuring that whole stage codegen fallback is disabled in tests? --- - To unsubscribe

[GitHub] spark issue #19959: [SPARK-22766] Install R linter package in spark lib dire...

2017-12-12 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/19959 Because `LOCAL_LIB_LOC` seems to default to `$SPARK_HOME/R/lib`, will installing to this folder create the potential for `lintr` to be installed into a `lib` folder which gets packaged up in

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2017-11-28 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/19788 Is there an implicit assumption here that contiguous partitions' data can be decompressed / deserialized in a single stream? If the shuffled data is written with a non-relocatable seria

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-11-05 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/19433 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...

2017-08-31 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r136487281 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java --- @@ -47,23 +47,29 @@ private boolean shouldPool(long

[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2017-08-31 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/19077 Just curious: do you know where are we allocating these close-in-size chunks of memory? I understand the motivation, but just curious to know what's causing this pattern. I think the ori

[GitHub] spark issue #19056: [SPARK-21765] Check that optimization doesn't affect isS...

2017-08-28 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/19056 (Dummy comment to test JIRA linkage) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-08-08 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18752 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-08-08 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18752 This seems fine to me, especially since it's plausible that you might have a few-second GC pause in some situations. Let me go ahead and have Jenkins test this, then I'll merge it if

[GitHub] spark issue #18814: [SPARK-21608][SPARK-9221][SQL] Window rangeBetween() API...

2017-08-03 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18814 (test comment to test PR dashboard linking) -- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18814: [SPARK-21608][SPARK-9221][SQL] Window rangeBetween() API...

2017-08-03 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18814 (test comment to test PR dashboard linking) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17180: [SPARK-19839][Core]release longArray in BytesToBytesMap

2017-07-30 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/17180 This seems fine to me. That said, the updated test case is a bit confusing, but I don't think the original test was too clear to begin with. The original test was using the `ite

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-23 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18645 Looking at the source compatibility changes made here, I was a little confused about why we didn't need to make many more changes. In principle, it seemed like the ambiguous overload

[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...

2017-07-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128892346 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -353,7 +353,7 @@ class DatasetSuite extends QueryTest with

[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...

2017-07-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128891202 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala --- @@ -54,7 +54,10 @@ class TaskContextSuite extends SparkFunSuite with

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18662 Merged to master. Thanks for the quick reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #18662: [SPARK-21444] Be more defensive when removing bro...

2017-07-17 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/18662 [SPARK-21444] Be more defensive when removing broadcasts in MapOutputTracker ## What changes were proposed in this pull request? In SPARK-21444, @sitalkedia reported an issue where the

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-07-13 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/17150 @srowen, I'll disable those master branch 2.10 jobs and will update scripts to prevent them from being redeployed by accident. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #18476: [SPARK-20858][DOC][MINOR] Document ListenerBus ev...

2017-06-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/18476#discussion_r125149417 --- Diff: docs/configuration.md --- @@ -1398,6 +1398,15 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request #18467: [SPARK-21253][Core]Disable spark.reducer.maxReqSi...

2017-06-29 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/18467#discussion_r124945147 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -326,7 +326,7 @@ package object config { .doc("The b

  1   2   3   4   5   6   7   8   9   10   >