[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052849 --- Diff: core/src/main/scala/org/apache/spark/executor/ShuffleReadMetrics.scala --- @@ -80,13 +86,15 @@ class ShuffleReadMetrics private[spark] () extends

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052758 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java --- @@ -228,6 +230,36 @@ void closeAndWriteOutput() throws IOException

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052804 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -72,6 +72,27 @@ private[spark] class SortShuffleWriter[K, V, C

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-26 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17276 @jinxing64 If the intent behind these metrics is to help with SPARK-19659, it would be good to either add it as part of SPARK-19659 or subsequently (once the feature is merged). This

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-27 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108103956 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -169,6 +173,36 @@ public void write(Iterator> reco

[GitHub] spark issue #17522: [SPARK-18278] [Scheduler] Documentation to point to Kube...

2017-04-03 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17522 I dont think we should be pointing to third party projects in spark documentation - for example, it might be possible that some other effort gets merged in instead of the above. If/when it

[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...

2017-04-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17533 Tasks are scheduled by locality (which includes shuffle tasks too to some extent). This is making a lot of state mutable within TSM - is there any tests done which show improvements due to

[GitHub] spark pull request #17531: [SPARK-20217][core] Executor should not fail stag...

2017-04-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17531#discussion_r109865317 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,7 +432,7 @@ private[spark] class Executor

[GitHub] spark pull request #17531: [SPARK-20217][core] Executor should not fail stag...

2017-04-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17531#discussion_r110028203 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,7 +432,7 @@ private[spark] class Executor

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17531 I will leave it around a bit in case @JoshRosen has any further comments. Feel free to merge btw if you dont ! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-27 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17088 fetch failure does not imply lost executor - it could be a transient issue. Similarly, executor loss does not imply host loss. This is quite drastic for a fetch failure : spark already

[GitHub] spark issue #17012: [SPARK-19677][SS] Renaming a file atop an existing one s...

2017-02-27 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17012 What is the proposed semantics from this PR now ? - If file exists, ignore. - If file does not exist, try to rename - if fails, throw exception. Is this right ? If yes, the PR title

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-02-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r103516283 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -911,14 +916,14 @@ private[spark] class TaskSetManager

[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-03-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17088 +CC @tgravescs You might be interested in this given your comments on on the blacklisting PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16639: [SPARK-19276][CORE] Fetch Failure handling robust to use...

2017-03-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16639 @squito It looks good to me, thanks for the changes ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17140: [SPARK-19796][CORE] Fix serialization of long property v...

2017-03-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17140 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-03-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15505 @kayousterhout I am surprised it is not more, but I agree that the added complexity for such low returns is not worth it. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17166 What is the rationale for this change ? Is it to propagate the task kill reason to UI ? The one line in https://github.com/apache/spark/pull/17166/files#diff

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17166 If I did not miss it, there is no way for user to provide this information currently, right ? Or is that coming in a subsequent PR ? --- If your project is set up for it, you can reply to this

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17166 In that case, let us make it a more complete PR - with the proposed api changes included - so that we can evaluate the merits of the change in total. --- If your project is set up for it, you can

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104342097 --- Diff: core/src/test/scala/org/apache/spark/util/collection/MedianHeapSuite.scala --- @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104341937 --- Diff: core/src/test/scala/org/apache/spark/util/collection/MedianHeapSuite.scala --- @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104341778 --- Diff: core/src/main/scala/org/apache/spark/util/collection/MedianHeap.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104341854 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -754,7 +743,6 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104352237 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -754,7 +743,6 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104351990 --- Diff: core/src/test/scala/org/apache/spark/util/collection/MedianHeapSuite.scala --- @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104352100 --- Diff: core/src/main/scala/org/apache/spark/util/collection/MedianHeap.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16867 The cost for median heap could be higher than TreeMap imo - for example, the additional dequeue + enqueue when rebalance is required ? If the cost is high enough, we might want to relook at the

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-03-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17113 @markhamstra given the impact interruption has on lower layer libraries which dont handle it well (iirc hdfs ?), we probably will not set it to true even if spark code is robust, --- If your

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104513453 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -740,6 +743,7 @@ private[spark] class TaskSetManager

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16867 You are right, the strict definition requires us to average - it just makes it difficult to reason based on logs at times when the duration mentioned does not exist when there are

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104515039 --- Diff: core/src/main/scala/org/apache/spark/util/collection/MedianHeap.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104515191 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -740,6 +743,7 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104516899 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -740,6 +743,7 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104517384 --- Diff: core/src/main/scala/org/apache/spark/util/collection/MedianHeap.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104520626 --- Diff: core/src/main/scala/org/apache/spark/util/collection/MedianHeap.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104520859 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -740,6 +743,7 @@ private[spark] class TaskSetManager

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-03-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17113 @markhamstra Completely agree, I would love to see this enabled by default. For example, I really hate to see speculative tasks continuing to run when the taskset has completed (for example) - used

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r104524384 --- Diff: core/src/main/scala/org/apache/spark/util/collection/MedianHeap.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r104772015 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r104769358 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r104773405 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -732,6 +732,13 @@ class DAGScheduler

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r104770757 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -168,7 +168,8 @@ private[spark] class Executor( case Some

[GitHub] spark issue #17024: [SPARK-19525][CORE] Compressing checkpoints.

2017-03-07 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17024 It makes it possible to identify what the data within the file is (compressed or not) - for user's perusal (it does not change anything for the application, that is true). But before you c

[GitHub] spark issue #17198: [SPARK-19857][yarn] Correctly calculate next credential ...

2017-03-07 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17198 This is one of the scarey bugs I see in scala code ... the benefits of an explicit delimiter ! Wondering if it is possible to generalize this as a scala style check ... LGTM --- If

[GitHub] spark issue #17220: remove tungsten-sort.Because it is not represent 'org.ap...

2017-03-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17220 This was an exposed parameter, we cannot remove it - irrespective of the duplication. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17238 This is something to be configured appropriately at the lower level (namely topology config), and should be handled there Special casing this in spark is not right. Two scenarios

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105489927 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether the

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105489232 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether the

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105489812 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether the

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105490080 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether the

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105510888 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether the

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319198 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -151,9 +152,13 @@ object SparkSubmit extends CommandLineUtils { val

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526225 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -60,9 +60,10 @@ private[spark] class ResultTask[T, U

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319759 --- Diff: core/src/main/scala/org/apache/spark/scheduler/KerberosUtil.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105318927 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -257,10 +257,6 @@ private[deploy] class SparkSubmitArguments(args

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319503 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1028,7 +1028,7 @@ class DAGScheduler( val locs

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526180 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1006,7 +1006,7 @@ class DAGScheduler( runningStages

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526086 --- Diff: core/src/main/scala/org/apache/spark/scheduler/KerberosUser.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16788 @tgravescs , @vanzin - this PR for mesos changes how spark handles kerberos tokens fundamentally; would be good to have your views. +CC @jerryshao to also look at the PR, since you have

[GitHub] spark issue #17244: [SPARK-19889][SQL] Make TaskContext callbacks thread saf...

2017-03-12 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17244 LGTM. Would be great if other reviewers can also take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-13 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17238 @squito I looked at yarn bits about 3-4 years back, so I am sure you and @tgravescs know better :-) But yes, you did summarize what I was attempting to get to. I am not sure what the

[GitHub] spark pull request #17109: [SPARK-19740][MESOS]Add support in Spark to pass ...

2017-03-14 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17109#discussion_r106052638 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -99,6 +99,26 @@ private

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17295 I have not looked at the implementation in detail, but can you comment on why the change w.r.t plain text block data to remote executor ? Isn't it not simpler to transmit block conten

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17295 > First, keep in mind that there's no metadata that tells the receiver whether a block is encrypted or not. This means that methods like BlockManager.get, which can read block data fro

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17295 Just to be clear, I would prefer if we consistently did things - either encrypt all blocks while transferring (irrespective of sasl being enabled or not); or depend only on sasl for channel

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17295 > Not really sure what you mean here. But transferring encrypted data without RPC encryption is not really secure, since the encryption key is transferred to executors using an RPC. There'

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/14789 Hi @tgravescs, This is specifically for the case where new input source/output destinations are getting added after the driver has already started. Notebooks, long running streaming

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14789#discussion_r87078341 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -189,6 +190,39 @@ private[spark] abstract class

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14789#discussion_r87075067 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -138,6 +138,14 @@ class SparkHadoopUtil extends Logging

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14789#discussion_r87072908 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -64,7 +64,7 @@ class

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/14789 @tgravescs Interesting, so it means that there wont be a --principal/--keytab, and what is being done in AMCredentialRenewer will be done in the launcher itself ? So the driver behaves just like

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14789#discussion_r87116199 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -189,6 +190,39 @@ private[spark] abstract class

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/14789 @vanzin You are right, this does look orthogonal to what you and Tom described. 'Where' the renewal is happening is the difference - from within spark or from outside of spark (

[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15823 LGTM. Not at my laptop, would be great if you can merge @rxin, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15563: [SPARK-16759][CORE] Add a configuration property to pass...

2016-11-09 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15563 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-11-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/14789 Interesting, thanks for clarifying. The way I was looking at the change was, given we have ability for custom credential providers now, we would need to support for out-of-band (to the

[GitHub] spark issue #15563: [SPARK-16759][CORE] Add a configuration property to pass...

2016-11-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15563 Looks good to me --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #15563: [SPARK-16759][CORE] Add a configuration property to pass...

2016-11-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15563 Merging into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...

2016-11-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15861 @rxin I did see this PR, unfortunately it is a bit big and I am tied up with other things - cant get to it for next few days. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16038 Without understanding the specifics of the ML part here - wont the actual impact of a large dense vector on Task 'bytes' be minimal at best ? We do compress the task binary; and 1B zer

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90129155 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1089,66 +1064,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90116879 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1016,11 +1013,6 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r88075283 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapRedCommitProtocol.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90122251 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90121527 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90120144 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90127987 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90129259 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -561,7 +561,7 @@ class PairRDDFunctionsSuite extends SparkFunSuite

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r88077635 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90121670 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r87708046 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90119536 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90124359 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...

2016-11-29 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15861 @jiangxb1987 I did a single pass review - particularly given the similarities in both the codepaths and the classnames, I will need to go over it again to ensure we dont miss anything. --- If

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-20 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84374076 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15579 Btw, curious if you have actually tested this in yarn - I have a feeling it wont work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Changes done, please do let me know of any other comments, thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84533967 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-22 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 @srowen -Dmaven.test.skip=true is the documented way in maven to skip test compilation (http://maven.apache.org/surefire/maven-surefire-plugin/examples/skipping-test.html) which we are not

<    1   2   3   4   5   6   7   8   9   10   >