[GitHub] spark pull request #9875: [SPARK-11662] [YARN]. In Client mode, make sure we...

2016-07-12 Thread harishreedharan
Github user harishreedharan closed the pull request at: https://github.com/apache/spark/pull/9875 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request: [SPARK-13843][Streaming]Remove streaming-flume...

2016-03-14 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/11672#issuecomment-196555837 Thanks @rxin. Opened [SPARK-11806](https://issues.apache.org/jira/browse/SPARK-11806) to discuss this --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-13843][Streaming]Remove streaming-flume...

2016-03-14 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/11672#issuecomment-196553283 Hi @zsxwing, I think the discussion on supporting Kafka 0.9 should happen **if** we decide to keep Kafka in Spark itself. At this point, I think the piece

[GitHub] spark pull request: [SPARK-13843][Streaming]Remove streaming-flume...

2016-03-13 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/11672#issuecomment-196115717 I agree with @ksakellis on this one. It would be great if we can pull Kafka out as well. I understand that there are a lot of users who might find it difficult

[GitHub] spark pull request: [SPARK-13478] [yarn] Use real user when fetchi...

2016-02-26 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/11358#issuecomment-189425272 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-13478] [yarn] Use real user when fetchi...

2016-02-24 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/11358#issuecomment-188621725 This looks like it might affect HDFS tokens as well and error that looks like this might come up during the initial token renewal: ``` WARN

[GitHub] spark pull request: [SPARK-13478] [yarn] Use real user when fetchi...

2016-02-24 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/11358#issuecomment-188615270 So I have not tested using the keytab-based login with proxy user stuff at all. We get delegation tokens even there - does this issue affect that as well

[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...

2016-02-03 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/10953#issuecomment-179511363 I have a more fundamental question - given that this patch does not add a whole lot of new functionality but just ports the currently functionality to use the

[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...

2016-02-03 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/10953#discussion_r51799340 --- Diff: external/kafka-newapi/src/main/scala/org/apache/spark/streaming/kafka/newapi/DirectKafkaInputDStream.scala --- @@ -0,0 +1,208

[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...

2016-02-03 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/10953#discussion_r51798611 --- Diff: external/kafka-newapi/pom.xml --- @@ -0,0 +1,122 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="htt

[GitHub] spark pull request: [SPARK-11662] [YARN]. In Client mode, make sur...

2015-11-23 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9875#issuecomment-159085964 Also, to be clear..it would work fine in Cluster mode. In Client mode, #7394 should have taken care of the long-running app issue (though there was one where

[GitHub] spark pull request: [SPARK-11662] [YARN]. In Client mode, make sur...

2015-11-23 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9875#issuecomment-158988545 Yes, that is correct but this happens even without the tokens expiring. What do you think about doing the relogin in the YarnCLientSchedulerBackend

[GitHub] spark pull request: [SPARK-11662] [YARN]. In Client mode, make sur...

2015-11-20 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9875#issuecomment-158573009 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11662] [YARN]. In Client mode, make sur...

2015-11-20 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9875#issuecomment-158563127 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11662] Call startExecutorDelegationToke...

2015-11-20 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9635#issuecomment-158561381 I don't think this PR actually fixes the issue. I think the real issue is that once tokens are added to the credentials, Hadoop does not allow the user t

[GitHub] spark pull request: [SPARK-11662] [YARN]. In Client mode, make sur...

2015-11-20 Thread harishreedharan
GitHub user harishreedharan opened a pull request: https://github.com/apache/spark/pull/9875 [SPARK-11662] [YARN]. In Client mode, make sure we re-login before at… …tempting to create new delegation tokens if a new SparkContext is created within the same application

[GitHub] spark pull request: [SPARK-11821] Propagate Kerberos keytab for al...

2015-11-20 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9859#issuecomment-158498642 LGTM. Are there any other configs required? I remember Hadoop security had a bunch of configs. /cc @tgravescs --- If your project is set up for it

[GitHub] spark pull request: [SPARK-11821] Propagate Kerberos keytab for al...

2015-11-20 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9837#discussion_r45494051 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala --- @@ -166,7 +168,11 @@ private[hive] class ClientWrapper

[GitHub] spark pull request: [SPARK-11662] Call startExecutorDelegationToke...

2015-11-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9635#issuecomment-158255349 It is not really the token that is causing the issue. It looks like UGI of the current user has expired kerberos tickets. Anyway, this patch does not make

[GitHub] spark pull request: [SPARK-11662] Call startExecutorDelegationToke...

2015-11-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9635#issuecomment-158250401 Hmm, why does stopping the context actually remove the kerberos login, unless the token renewal interval has passed? The patch looks fine, but have you

[GitHub] spark pull request: [SPARK-11821] Propagate Kerberos keytab for al...

2015-11-19 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9837#discussion_r45425265 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala --- @@ -166,7 +168,11 @@ private[hive] class ClientWrapper

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2015-11-18 Thread harishreedharan
Github user harishreedharan closed the pull request at: https://github.com/apache/spark/pull/2994 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2015-11-18 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-157908818 I am closing this PR. I have put this up here: https://github.com/cloudera/spark-kafka-writer --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-11740][Streaming]Fix the race condition...

2015-11-17 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9707#issuecomment-157469418 LGTM. Thanks @zsxwing ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-11740][Streaming]Fix the race condition...

2015-11-17 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9707#discussion_r45102167 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -187,16 +187,27 @@ class CheckpointWriter( private var

[GitHub] spark pull request: [SPARK-11740][Streaming]Fix the race condition...

2015-11-17 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9707#discussion_r45099938 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -187,16 +187,27 @@ class CheckpointWriter( private var

[GitHub] spark pull request: [SPARK-11740][Streaming]Fix the race condition...

2015-11-17 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9707#discussion_r45098059 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -187,16 +187,27 @@ class CheckpointWriter( private var

[GitHub] spark pull request: [SPARK-11731][STREAMING] Enable batching on Dr...

2015-11-13 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9695#issuecomment-156583976 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-11419][STREAMING] Parallel recovery for...

2015-11-09 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9373#issuecomment-155257362 @brkyvz Sounds good, sir. I think the issue you saw seems to be a protobuf incompatibility issue - did you compile and run against the same hadoop-2 version

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-06 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44187373 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -488,7 +491,12 @@ class ReceiverTracker(ssc

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-06 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44185733 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -488,7 +491,12 @@ class ReceiverTracker(ssc

[GitHub] spark pull request: [SPARK-11457][Streaming][YARN] Fix incorrect A...

2015-11-05 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9412#issuecomment-154256737 Where is this being set into the SparkConf for a normal app? I am not sure of the parameters, but setting the new values looks good (not sure if

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44085335 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44085133 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44084903 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44070214 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44069967 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44069852 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceivedBlockTracker.scala --- @@ -157,9 +166,12 @@ private[streaming] class

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44057628 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43964698 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43964535 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43964017 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43963794 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43963503 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43962050 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -439,6 +439,9 @@ class ReceiverTracker(ssc

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43961907 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceivedBlockTracker.scala --- @@ -157,9 +166,12 @@ private[streaming] class

[GitHub] spark pull request: [SPARK-11419][STREAMING] Parallel recovery for...

2015-11-01 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9373#issuecomment-152893856 Did you try HDFS? I am assuming we'd get similar speed ups there too but in that case there are far fewer files in which case the cost to setu

[GitHub] spark pull request: [SPARK-11419][STREAMING] Parallel recovery for...

2015-10-31 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9373#discussion_r43578480 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala --- @@ -126,11 +127,11 @@ private[streaming] class

[GitHub] spark pull request: [SPARK-11424] Guard against double-close() of ...

2015-10-30 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9382#issuecomment-152637344 +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-11324][STREAMING] Flag for closing Writ...

2015-10-26 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9285#discussion_r43071619 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/WriteAheadLogUtils.scala --- @@ -39,6 +39,7 @@ private[streaming] object

[GitHub] spark pull request: [SPARK-11324][STREAMING] Flag for closing Writ...

2015-10-26 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9285#issuecomment-151282234 If there is no other option, this LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11324] Flag for closing Write Ahead Log...

2015-10-26 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9285#issuecomment-151281257 Wouldn't this be really expensive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-20 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-149787894 If the current user's ugi is what is used by the FileSystem cache, this should not really be an issue no? Because we actually do update the current u

[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-20 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-149666851 So this is my theory (I don't have anything to back this up really). My assumption is based on the fact that if we don't set `hadoop.fs.hdfs.impl.dis

[GitHub] spark pull request: [SPARK-11201] [YARN] Make sure SPARK_YARN_MODE...

2015-10-20 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9174#issuecomment-149651474 Looks like SPARK-10812 would actually take care of this - since the check happens in run time. +1 on backporting that to 1.5 branch. I will close this

[GitHub] spark pull request: [SPARK-11201] [YARN] Make sure SPARK_YARN_MODE...

2015-10-20 Thread harishreedharan
Github user harishreedharan closed the pull request at: https://github.com/apache/spark/pull/9174 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request: [SPARK-11201] [YARN] Make sure SPARK_YARN_MODE...

2015-10-20 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9174#discussion_r42516999 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -1025,9 +1025,6 @@ object Client extends Logging { "f

[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-149442481 Agreed - synchronization is painful and we could end up missing events. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-11201] [YARN] Make sure SPARK_YARN_MODE...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9174#issuecomment-149442366 /cc @tdas @tgravescs @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-11201] [YARN] Make sure SPARK_YARN_MODE...

2015-10-19 Thread harishreedharan
GitHub user harishreedharan opened a pull request: https://github.com/apache/spark/pull/9174 [SPARK-11201] [YARN] Make sure SPARK_YARN_MODE is set before SparkHad… …oopUtil.get is called. If `StreamingContext.getOrCreate` is used in `yarn-client` mode, a

[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-149440116 Here it is: OK, I think I know the issue - the reason is probably that the credentials are cached in the FileSystem instance using which the write

[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-149439989 Actually that is not right..I posted an explanation on your other PR. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8867#issuecomment-149372740 OK, I think I know the issue - the reason is probably that the credentials are cached in the `FileSystem` instance using which the write happens. Since we are

[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-149344858 Hmm, I think the real issue is that the event logging does not doAs. I think in `yarn-cluster`, since the SparkContext is created in the AM, the updated

[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9168#discussion_r42421617 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala --- @@ -177,6 +177,7 @@ private[yarn] class

[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9168#issuecomment-149338159 /cc @tgravescs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11182] HDFS Delegation Token will be ex...

2015-10-19 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9168#discussion_r42420896 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -130,6 +132,20 @@ class SparkHadoopUtil extends Logging

[GitHub] spark pull request: [SPARK-11109] [CORE] Move FsHistoryProvider of...

2015-10-16 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9144#issuecomment-148717590 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-11109] [CORE] Move FsHistoryProvider of...

2015-10-15 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9144#issuecomment-148563344 Does this class exist in older versions of Hadoop, like 1.1 etc? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-11020] [core] Wait for HDFS to leave sa...

2015-10-12 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9043#issuecomment-147552208 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-11020] [core] Wait for HDFS to leave sa...

2015-10-12 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9043#discussion_r41785632 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -170,7 +223,21 @@ private[history] class

[GitHub] spark pull request: [SPARK-11020] [core] Wait for HDFS to leave sa...

2015-10-09 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9043#issuecomment-146970575 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-11020] [core] Wait for HDFS to leave sa...

2015-10-09 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9043#discussion_r41670523 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -170,7 +223,21 @@ private[history] class

[GitHub] spark pull request: [SPARK-11020] [core] Wait for HDFS to leave sa...

2015-10-09 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9043#discussion_r41670365 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -585,6 +652,37 @@ private[history] class

[GitHub] spark pull request: [SPARK-11020] [core] Wait for HDFS to leave sa...

2015-10-09 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/9043#discussion_r41669645 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -52,6 +53,10 @@ private[history] class FsHistoryProvider

[GitHub] spark pull request: [SPARK-11019][streaming][flume] Gracefully shu...

2015-10-08 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9041#issuecomment-146730398 In most cases, less than 5-10 seconds - just for the last in-flight batch to be done. This is not going to slow our unit test runs down. I mostly hit it in our

[GitHub] spark pull request: [SPARK-11019][streaming][flume] Gracefully shu...

2015-10-08 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9041#issuecomment-146730211 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11019][streaming][flume] Gracefully shu...

2015-10-08 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9041#issuecomment-146730016 This basically makes testing a bit more predictable. Sometimes, we end up hitting a situation where the last transaction is still not completed and the

[GitHub] spark pull request: [SPARK-11019][streaming][flume] Gracefully shu...

2015-10-08 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9041#issuecomment-146724990 /cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-11019][streaming][flume] Gracefully shu...

2015-10-08 Thread harishreedharan
GitHub user harishreedharan opened a pull request: https://github.com/apache/spark/pull/9041 [SPARK-11019][streaming][flume] Gracefully shutdown Flume receiver th… …reads. Wait for a minute for the receiver threads to shutdown before interrupting them. You can merge

[GitHub] spark pull request: [SPARK-10987] [yarn] Workaround for missing ne...

2015-10-07 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/9021#issuecomment-146420692 LGTM. This though worries me that there are other similar bugs lurking in the background. I would expect the new RPC to behave exactly like the old one

[GitHub] spark pull request: [SPARK-10955][streaming] Disable dynamic alloc...

2015-10-06 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8998#issuecomment-146033142 @tdas What do you think about the above? If you still think we should just make it a warn, I will make the change. --- If your project is set up for it, you

[GitHub] spark pull request: [streaming] SPARK-10955. Disable dynamic alloc...

2015-10-06 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8998#issuecomment-146017855 As I said, I don't mind either. But in this case, we need to be conservative. We can additional checks to see if WAL is enabled, but it is not possib

[GitHub] spark pull request: [streaming] SPARK-10955. Disable dynamic alloc...

2015-10-06 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8998#issuecomment-146012980 I don't have a strong preference for the config name. @vanzin, @andrewor14 - like the current name or the one which @markgrover suggested? Vote p

[GitHub] spark pull request: [streaming] SPARK-10955. Disable dynamic alloc...

2015-10-06 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8998#issuecomment-145996934 Added config parameter to enable it if the user really wants to enable dynamic allocation --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [streaming] SPARK-10955. Disable dynamic alloc...

2015-10-06 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8998#issuecomment-145968060 Actually looking at this again, I think our only option is to log a message. It is possible that the `SparkContext` was already created and passed to us, in

[GitHub] spark pull request: [streaming] SPARK-10955. Disable dynamic alloc...

2015-10-06 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8998#issuecomment-145966395 /cc @vanzin @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [streaming] SPARK-10955. Disable dynamic alloc...

2015-10-06 Thread harishreedharan
GitHub user harishreedharan opened a pull request: https://github.com/apache/spark/pull/8998 [streaming] SPARK-10955. Disable dynamic allocation for Streaming app… …lications. Dynamic allocation can be painful for streaming apps and can lose data. The one drawback

[GitHub] spark pull request: [SPARK-10812] [yarn] Fix shutdown of token ren...

2015-10-06 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8996#issuecomment-145951831 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-10916] Set perm gen size when launching...

2015-10-05 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8970#issuecomment-145689866 LGTM too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-09-29 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-144273051 Hmm, this might be due to the cached token being missed? So it looks like the token got replaced alright, but it seems like the file could not be written with

[GitHub] spark pull request: [SPARK-10473][YARN]Login again in the driver t...

2015-09-29 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8942#issuecomment-144268884 Why would #8867 not be sufficient?It looks like that should be enough. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-29 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/8867#discussion_r40708505 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -544,6 +545,7 @@ private[spark] class Client( logInfo(s

[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-29 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/8867#discussion_r40705866 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorDelegationTokenUpdater.scala --- @@ -76,7 +76,10 @@ private[spark] class

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2015-09-28 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-143944627 I will get back to this one in a week or so --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-25 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8867#issuecomment-143322902 @SaintBacchus Right. That is what I was talking about above. I think your fix takes care of this issue. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-10755][YARN]Set driver also update the ...

2015-09-24 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8867#issuecomment-143012421 @tgravescs So seems like this would fix that issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [Spark-10692][Streaming] Expose failureReasons...

2015-09-23 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8892#issuecomment-142781672 Is this for a different branch? Why a new PR for the same thing? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-10772][Streaming][Scala]: NullPointerEx...

2015-09-23 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/8881#issuecomment-142656310 Looking at it again, @srowen is right. Returning `None` makes `getOrCompute` think that no RDDs have been generated for a given time (artifact of the fact that

[GitHub] spark pull request: [SPARK-10772][Streaming][Scala]: NullPointerEx...

2015-09-23 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/8881#discussion_r40177949 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala --- @@ -210,6 +210,18 @@ class BasicOperationsSuite extends

  1   2   3   4   5   6   7   8   9   >