[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-07 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 As I mentioned before, this is definitely a huge step in the right direction ! Having said that, I want to ensure we dont aggressively blacklist executors and nodes - at scale, I have seen

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 @zhzhan I am curious why this is the case for the jobs being mentioned. This pr should have an impact if the locality preference of the taskset being run is fairly suboptimal to begin with, no

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82664592 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,120 @@ +/* + * Licensed under the Apache

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82665139 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,127 @@ +/* + * Licensed under the Apache

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82665543 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,127 @@ +/* + * Licensed under the Apache

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82665886 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,127 @@ +/* + * Licensed under the Apache

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r8279 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,127 @@ +/* + * Licensed under the Apache

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15408 There is a behavioral change with this PR, which I am not sure is relevant. BufferedInputStream supports mark/reset, while we are not doing so here - does deserialization and other codepaths

[GitHub] spark pull request #15371: [SPARK-17816] [Core] Fix ConcurrentModificationEx...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15371#discussion_r82670700 --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala --- @@ -444,7 +444,9 @@ class CollectionAccumulator[T] extends AccumulatorV2[T

[GitHub] spark pull request #15371: [SPARK-17816] [Core] Fix ConcurrentModificationEx...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15371#discussion_r82683704 --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala --- @@ -444,7 +444,9 @@ class CollectionAccumulator[T] extends AccumulatorV2[T

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15408 Barring query to @rxin (regarding buffer pooling), I am fine with the change - pretty neat, thanks @sitalkedia ! Would be good if more eyeballs look at it though given how fundamental it is

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82816160 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,129 @@ +/* + * Licensed under the Apache License

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 Why would corrupt record cause EOFException to be thrown ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @srowen The tuples already returned would have been valid, it is the subsequent block decompression which has failed. For example, in a 1gb file, the last few bytes missing (or corrupt) will cause

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @zsxwing You are right, NewHadoopRDD is not handling this case. Probably would be good to add exception handling there when nextKeyValue throws exception ? Context is, for large jobs

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @marmbrus +1 on logging, that is definitely something which was probably missed here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @zsxwing The map task is run by https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @srowen Since this is happening 'below' the user code (in the hadoop rdd), is there a way around how to handle this ? I agree that for a lot of usecases where it is critical to wo

[GitHub] spark issue #12524: [SPARK-12524][Core]DagScheduler may submit a task set fo...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12524 @seayi any progress on this ? Would be good to add this in if consistently reproducible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932992 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -179,7 +183,16 @@ class NewHadoopRDD[K, V]( override def

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932645 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -253,8 +256,12 @@ class HadoopRDD[K, V]( try { finished

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -588,6 +588,12 @@ object SQLConf { .doubleConf

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82933077 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -170,4 +170,9 @@ package object config { .doc("Port to us

[GitHub] spark issue #15422: [SPARK-17850][Core]Add a flag to ignore corrupt files

2016-10-12 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 Merged - had issue with pip (new laptop, sigh), and so jira and pr did not get closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #15454: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-13 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15454#discussion_r83155108 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -245,8 +248,7 @@ class HadoopRDD[K, V]( try { finished

[GitHub] spark pull request #15454: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-13 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15454#discussion_r83155316 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -171,7 +175,11 @@ class NewHadoopRDD[K, V]( override def

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 Would be cleaner to simply copy executorDataMap.keys and works off that to ensure there is no coupling between actor thread and invoker. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 I am assuming @kayousterhout does not have comments on this. Can you please fix the conflict @zhzhan ? I will merge it in after that to master. --- If your project is set up for it, you can

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 Merged to master, thanks @zhzhan ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15512: [SPARK-17930][CORE]The SerializerInstance instanc...

2016-10-17 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15512#discussion_r83752469 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala --- @@ -84,6 +90,7 @@ private[spark] class TaskResultGetter(sparkEnv

[GitHub] spark issue #15531: [SQL][STREAMING][TEST] Follow up to remove Option.contai...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15531 @srowen Unfortunately the 'Some' in the == is usually missed out, resulting in bugs (and we have had a fair share of them in the past - some more severe than others). Given that t

[GitHub] spark pull request #15382: [SPARK-17810] [SQL] Default spark.sql.warehouse.d...

2016-10-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15382#discussion_r83928417 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -741,7 +741,7 @@ private[sql] class SQLConf extends Serializable with

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @scwf I think the initial fix with a small change might be sufficient. What I meant was something like this : ``` protected def reset(): Unit = { val executors

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing Ah, then simply making it send() instead of askWithRetry() should do, no ? That was actually in the initial PR - I was not sure if we want to change the behavior from askWithRetry to

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 LGTM, @zsxwing any comments ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 BTW, it was interesting that the earlier change did not trigger a test failure (the issue @viirya pointed out - about needing to move RemoveExecutor to receive) --- If your project is set up for

[GitHub] spark pull request #15550: [SPARK-18003][Spark Core] Fix bug of RDD zipWithI...

2016-10-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15550#discussion_r84004989 --- Diff: core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala --- @@ -64,8 +64,14 @@ class ZippedWithIndexRDD[T: ClassTag](prev: RDD[T

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 +CC @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/15553 [SPARK-18008] [build] Add support for -Dmaven.test.skip=true and -Dmaven.javadoc.skip=true ## What changes were proposed in this pull request? Add support for -Dmaven.test.skip=true and

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84047441 --- Diff: common/network-common/pom.xml --- @@ -77,27 +77,40 @@ compile - - - log4j - log4j

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84047904 --- Diff: pom.xml --- @@ -2415,6 +2389,67 @@ + + docBuild + + + maven.javadoc.skip

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84049757 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 @srowen You are right; for normal build, this should be a no-op. Currently, there is no way to suppress compilation of tests (we can suppress running test via -DskipTests but it will still

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Thanks for the link to 'skip' - will test it out ! About test compilation : For example, in common/network-shuffle/pom.xm

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Please note that this applies only the spark artifacts - not any of the others (which, as you mentioned, will be a simple disk lookup). Since I was moving it for spark generated artifacts anyway

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84057228 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84081078 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84154979 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84160226 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing To minimize scope of synchronized block. The way @scwf has now, the synchronized block is limited to duplicating key and setting some state. Remaining can happen outside of the lock

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing I think the issue is that case RemoveExecutor() is not identical to what exists in receiveAndReply Any reason 'executorDataMap.get(executorId).foreach(_.executorEndpoint

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 Ah ! Apologies, I got confused. Yes, I agree, that is a better approach. It also means we can get rid of the RemoveExecutor pattern match from receive right ? As it stands now, that looks

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-20 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84225583 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779213 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -17,48 +17,61 @@ package org.apache.spark.storage -import

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779546 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779004 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -34,6 +34,8 @@ import org.apache.spark.util.{ShutdownHookManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106269093 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -102,4 +150,34 @@ private[spark] object CryptoStreamUtils extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r10677 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -94,7 +101,11 @@ private[spark] class DiskBlockManager(conf: SparkConf

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778932 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779457 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106264689 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -63,12 +83,40 @@ private[spark] object CryptoStreamUtils extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778914 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -17,48 +17,61 @@ package org.apache.spark.storage -import

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778813 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106268712 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -63,12 +83,40 @@ private[spark] object CryptoStreamUtils extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106063310 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -48,12 +50,30 @@ private[spark] object CryptoStreamUtils extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779317 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778005 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -56,6 +57,43 @@ private[spark] class BlockResult( val bytes: Long

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778650 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -79,6 +81,11 @@ private[spark] class DiskBlockManager(conf: SparkConf

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778688 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -17,48 +17,61 @@ package org.apache.spark.storage -import

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778760 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16867 LGTM @kayousterhout , @squito. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17290: [SPARK-16599][CORE] java.util.NoSuchElementExcept...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17290#discussion_r106788142 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala --- @@ -340,7 +340,7 @@ private[storage] class BlockInfoManager extends

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106788877 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,8 +212,8 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106789380 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -540,6 +540,39 @@ class SparkContextSuite extends SparkFunSuite with

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106788727 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -354,7 +354,7 @@ private[spark] object UIUtils extends Logging

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106789297 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -540,6 +540,39 @@ class SparkContextSuite extends SparkFunSuite with

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106789004 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17343 If we make flush() noop, then buffered (uncommitted) data wont be written to the stream; am I missing something here, or is this change broken ? --- If your project is set up for it, you can reply

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17343 Background - you need to do a flush() to ensure the indices generated are valid. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17343 Ah, looks like I missed that CountingOutputStream was introduced after BOS and not before. Looks good to me. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107233146 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -302,12 +298,12 @@ private[spark] class Executor( // If

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107239694 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107235395 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -569,8 +575,10 @@ class SparkContextSuite extends SparkFunSuite with

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107234055 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -59,8 +59,8 @@ private[spark] class TaskContextImpl( /** List of callback

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107234445 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -160,15 +160,20 @@ private[spark] abstract class Task[T]( // A flag

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107237325 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -215,7 +215,8 @@ private[spark] class PythonRunner

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107235703 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,8 +212,8 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107232894 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -140,16 +140,22 @@ private[spark] class TaskContextImpl

[GitHub] spark issue #17290: [SPARK-16599][CORE] java.util.NoSuchElementException: No...

2017-03-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17290 I agree with @srowen I dont see how this change affects the test. `blocksWithReleasedLocks` should be unchanged w.r.t this test. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17166 Hi @kayousterhout, Can you take over reviewing this PR ? I might be tied up with other things for next couple of weeks, and I dont want @ericl's work to be blocked on me.

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107345386 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...

2017-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17300#discussion_r107352668 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -555,12 +555,15 @@ private[spark] class BlockManager

[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...

2017-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17300#discussion_r107352378 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -500,6 +500,30 @@ class BlockManagerSuite extends SparkFunSuite with

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107631487 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107773629 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107797382 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -215,7 +215,7 @@ private[spark] class PythonRunner

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107797887 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +479,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-24 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17343 LGTM will wait a bit to allow for others to comment. @zsxwing can you also take a look ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052747 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -169,6 +173,36 @@ public void write(Iterator> reco

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052753 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -169,6 +173,36 @@ public void write(Iterator> reco

<    1   2   3   4   5   6   7   8   9   10   >