[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-24 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15579 @srowen, another use cases would be trace tools like `strace` which will trace the system calls for process. One way of using `strace` is to add `strace` before executing command

[GitHub] spark pull request #15210: [SPARK-17604][SQL][Streaming] Supprt purging aged...

2016-10-24 Thread jerryshao
Github user jerryshao closed the pull request at: https://github.com/apache/spark/pull/15210 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15210: [SPARK-17604][SQL][Streaming] Supprt purging aged file e...

2016-10-24 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15210 Sure, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15598: [SPARK-18027][YARN] .sparkStaging not clean on RM...

2016-10-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15598#discussion_r84643826 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -1059,9 +1059,11 @@ private[spark] class Client( } catch

[GitHub] spark issue #15588: [SPARK-18039][Scheduler] fix bug maxRegisteredWaitingTim...

2016-10-24 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15588 I think this fix cannot really handle this imbalance receiver allocation problem, also blindly waste the CPU time. What @lw-lin mentioned is a feasible solution to wait for executors

[GitHub] spark pull request #15563: [SPARK-16759][CORE] Add a configuration property ...

2016-10-20 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15563#discussion_r84229497 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -92,8 +92,16 @@ private[spark] abstract class Task[T]( kill

[GitHub] spark pull request #15563: [SPARK-16759][CORE] Add a configuration property ...

2016-10-20 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15563#discussion_r84228545 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -465,6 +465,8 @@ object SparkSubmit { OptionAssigner

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-19 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r84212380 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2432,6 +2432,26 @@ private[spark] object Utils extends Logging

[GitHub] spark issue #15545: [SPARK-17999][Kafka][SQL] Add getPreferredLocations for ...

2016-10-19 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15545 @zsxwing , would you mind taking a look at this PR? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15377: [SPARK-17802] Improved caller context logging.

2016-10-19 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15377 LGTM just some minor things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15545: [SPARK-17999][Kafka][SQL] Add getPreferredLocatio...

2016-10-18 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/15545 [SPARK-17999][Kafka][SQL] Add getPreferredLocations for KafkaSourceRDD ## What changes were proposed in this pull request? The newly implemented Structured Streaming `KafkaSource` did

[GitHub] spark pull request #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrai...

2016-10-18 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15481#discussion_r83991614 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -393,7 +393,7 @@ class

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15481 Seems it could be changed to `send` instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15377: [SPARK-17802] Improved caller context logging.

2016-10-17 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15377 Yup, that's what I mean. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15377: [SPARK-17802] Improved caller context logging.

2016-10-17 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15377 @srowen I'm not against this change, personally because the usage of flag is wired to me and frankly saying I haven't seen such pattern in the Spark code. Since we want to avoid re

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15481 LGTM, sorry to bring in deadlock issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-13 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r83340292 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2474,25 +2478,42 @@ private[spark] class CallerContext( val context

[GitHub] spark pull request #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replC...

2016-10-13 Thread jerryshao
Github user jerryshao closed the pull request at: https://github.com/apache/spark/pull/15253 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassSer...

2016-10-13 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15253 Sure, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-13 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r83336101 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2474,25 +2478,42 @@ private[spark] class CallerContext( val context

[GitHub] spark issue #15377: [SPARK-17802] Improved caller context logging.

2016-10-13 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15377 Another thing, do you verify it locally? Since there's no unit test to cover it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-13 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r83181601 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2479,20 +2483,35 @@ private[spark] class CallerContext

[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...

2016-10-13 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15456 Looks like unrelated test failure, it can be passed in my local test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...

2016-10-13 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15456 Thanks @rxin and @andrewor14 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15456: [SPARK-17686][Core] Support printing out scala an...

2016-10-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15456#discussion_r83146972 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -104,6 +104,8 @@ object SparkSubmit

[GitHub] spark issue #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassSer...

2016-10-12 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15253 @zsxwing , would you mind taking a look at this fix for 1.6 branch, thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15456: [SPARK-17686][Core] Support printing out scala an...

2016-10-12 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/15456 [SPARK-17686][Core] Support printing out scala and java version with spark-submit --version command ## What changes were proposed in this pull request? In our universal gateway service

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r82943152 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2474,25 +2478,42 @@ private[spark] class CallerContext( val context

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r82942829 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2474,25 +2478,42 @@ private[spark] class CallerContext( val context

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r82942438 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2432,6 +2432,10 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replC...

2016-09-26 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/15253 [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassServer.port in scala-2.11 repl ## What changes were proposed in this pull request? Spark 1.6 Scala-2.11 repl doesn't honor

[GitHub] spark pull request #15195: [SPARK-17632][SQL]make console sink and other sin...

2016-09-25 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15195#discussion_r80396970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -290,8 +284,8 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #15195: [SPARK-17632][SQL]make console sink and other sin...

2016-09-22 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15195#discussion_r80178129 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -290,8 +284,8 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #15210: [SPARK-17604][SQL][Streaming] Supprt purging aged...

2016-09-22 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/15210 [SPARK-17604][SQL][Streaming] Supprt purging aged file entries in FileStreamSourceLog ## What changes were proposed in this pull request? Currently with [SPARK-15698](https

[GitHub] spark issue #15206: [SPARK-17640][SQL]Avoid using -1 as the default batchId ...

2016-09-22 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15206 LGTM, thanks for the fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15137: [SPARK-17512][Core] Avoid formatting to python path for ...

2016-09-21 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15137 Try to think in another way from `PythonRunner`'s point, this comment looks correct. For yarn and mesos cluster mode, it is because we leverage distributed cache or other to download python files

[GitHub] spark pull request #15173: [SPARK-15698][SQL][Streaming][Follw-up]Fix FileSt...

2016-09-20 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/15173 [SPARK-15698][SQL][Streaming][Follw-up]Fix FileStream source and sink log get configuration issue ## What changes were proposed in this pull request? This issue was introduced

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-09-20 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/13513#discussion_r79747197 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLog.scala --- @@ -79,213 +76,46 @@ object SinkFileStatus

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-09-19 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/13513#discussion_r79530152 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSourceLog.scala --- @@ -0,0 +1,132 @@ +/* + * Licensed

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-19 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15134 My concern is that previously Spark will throw an exception if app name is not set, while in 2.0 we bring in SparkSession which breaks the convention, so do we need to let SparkSession

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-19 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15134 @phalodi we don't restrict user to have to set an app name either for SparkContext or SparkSession. You could refer to this code in SparkSubmit: ``` // Set name from main class

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-18 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15134 Well, I understand your meaning, I'm guessing most of the users they're using SparkSubmit or SparkLaunch to start application and in that case app name should be well figured out even if not set

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-18 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15134 From my understanding of current code, looks like there's no chance app name will be null if we're using spark-submit to submit applications. --- If your project is set up for it, you can reply

[GitHub] spark pull request #15137: [SPARK-17512][Core] Avoid formatting to python pa...

2016-09-18 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/15137 [SPARK-17512][Core] Avoid formatting to python path for yarn and mesos cluster mode ## What changes were proposed in this pull request? Yarn and mesos cluster mode support remote python

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-09-15 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/13513#discussion_r79099764 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSourceLog.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-09-15 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/13513#discussion_r79093102 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSourceLog.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-14 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/13513 Thanks a lot @zsxwing and @frreiss for your comments. For the slow scan problem of compact batch. Originally I planned to to not merge the latest batch as I did before, also suggested

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/13513 @zsxwing @frreiss thanks a lot for your comments. I think the semantics of `FileStreamSource.getBatch(start: Option[Offset], end: Offset)` still keeps the same, since I overrided

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/13513 @zsxwing , thanks a lot for your comments, I did several refactorings: 1. Abstract and consolidate `FileStreamSinkLog` and `FileStreamSourceLog`, now they share same code path to do

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-07 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/13513 Sure, I will change the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14887: [SPARK-17321][YARN] YARN shuffle service should use good...

2016-09-06 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14887 @zhaoyunjiong , the fix you made may introduce a situation where recovery data will be existed in multiple directories, I'm not sure if this will introduce recovery issue or others, since now

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14961 Also many other downstream and upstream applications may also use different version of Netty jar, it would be better to keep stable for these fundamental dependences. --- If your project is set

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-05 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14961 Upgrading Netty version to branch 1.6 may cause API version incompatible issue for yarn shuffle service, please see [SPARK-16018](https://issues.apache.org/jira/browse/SPARK-16018) and [SPARK

[GitHub] spark pull request #14887: [SPARK-17321][YARN] YARN shuffle service should u...

2016-09-01 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14887#discussion_r77282332 --- Diff: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java --- @@ -25,6 +25,8 @@ import

[GitHub] spark issue #14916: [SPARK-17340][YARN] cleanup .sparkStaging when app is ki...

2016-09-01 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14916 Agree with @tgravescs . Actually this issue only exists when local `yarn#client` process is gone and application is killed by yarn command. In this case the staging dir

[GitHub] spark issue #14873: [SPARK-17308]Improved the spark core code by replacing a...

2016-09-01 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14873 From my understanding it is more like a personal preference rather than code style issue. We may change the code for now, but how can we guarantee other people not to use pattern match in future

[GitHub] spark pull request #14887: [SPARK-17321][YARN] YARN shuffle service should u...

2016-08-30 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14887#discussion_r76911079 --- Diff: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java --- @@ -270,9 +272,17 @@ protected Path getRecoveryPath

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2016-08-25 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14617 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14804: [MINOR][Web UI] Correctly convert bytes in web UI

2016-08-25 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14804 I think [here](http://ux.stackexchange.com/questions/13815/files-size-units-kib-vs-kb-vs-kb) has a precise definition. AFAIK in Spark the conversion is 1024 based either KB, K, or kb, KiB

[GitHub] spark issue #14804: [MINOR][Web UI] Correctly convert bytes in web UI

2016-08-25 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14804 Because in the log it shows Memory MB in 1024 based, while in the web UI it is 1000 based, so this is slightly different. You could check `Utils#bytesToString`. I think we unify

[GitHub] spark pull request #14804: [MINOR][Web UI] Correctly convert bytes in web UI

2016-08-25 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/14804 [MINOR][Web UI] Correctly convert bytes in web UI ## What changes were proposed in this pull request? should be 1024 based, not 1000. ## How was this patch tested

[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...

2016-08-25 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14802 Looks like this is a little similar to this one #13513 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2016-08-25 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14617 @mallman I changed the UI based on your comment, here is the new one (separate the on heap and off heap memory usage in two columns): ![screen shot 2016-08-25 at 3 28 31 pm](https

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-08-24 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14789 @tgravescs , with [SPARK-14743](https://issues.apache.org/jira/browse/SPARK-14743), credentials/tokens can be managed out of Spark with their own credential provider. In that case user could

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-08-24 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14789 The user themselves will call this API. Another option in [SPARK-14743](https://issues.apache.org/jira/browse/SPARK-14743) is to add this API in `SparkContext`. But from my understanding

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-08-24 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/14789 [SPARK-17209][YARN] Add the ability to manually update credentials for Spark running on YARN ## What changes were proposed in this pull request? This PR propose to add a new API

[GitHub] spark issue #14728: [SPARK-17165][SQL] FileStreamSource should not track the...

2016-08-24 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14728 For the definition of `maxAge`, currently from the code it is max age to latest file, people may misunderstand it is max age to current time, so it would be better to document the meaning

[GitHub] spark pull request #14728: [SPARK-17165][SQL] FileStreamSource should not tr...

2016-08-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14728#discussion_r76015528 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -41,36 +40,59 @@ class FileStreamSource

[GitHub] spark pull request #14728: [SPARK-17165][SQL] FileStreamSource should not tr...

2016-08-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14728#discussion_r76011569 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -41,36 +40,59 @@ class FileStreamSource

[GitHub] spark pull request #14728: [SPARK-17165][SQL] FileStreamSource should not tr...

2016-08-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14728#discussion_r76011328 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -41,36 +40,59 @@ class FileStreamSource

[GitHub] spark pull request #14728: [SPARK-17165][SQL] FileStreamSource should not tr...

2016-08-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14728#discussion_r76011102 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamOptions.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed

[GitHub] spark issue #14728: [SPARK-17165][SQL] FileStreamSource should not track the...

2016-08-24 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14728 Sure, let me take a look at this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2016-08-23 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14617 @mallman thanks a lot for your comments, I will change the UI to split into separate columns. Yes, as you mentioned current executor memory usage tracked in Standalone Master only shows

[GitHub] spark pull request #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-22 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14768#discussion_r75805277 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -522,7 +522,8 @@ public long spill() throws

[GitHub] spark pull request #14744: [SPARKR][SPARKSUBMIT] Allow to set sparkr shell c...

2016-08-22 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14744#discussion_r75633858 --- Diff: docs/configuration.md --- @@ -1752,6 +1752,15 @@ showDF(properties, numRows = 200, truncate = FALSE) Executable for executing R scripts

[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2016-08-12 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/14617 [SPARK-17019][Core] Expose on-heap and off-heap memory usage various places ## What changes were proposed in this pull request? With [SPARK-13992](https://issues.apache.org/jira/browse

[GitHub] spark issue #14581: Correct example value for spark.ssl.YYY.XXX settings

2016-08-11 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14581 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...

2016-08-09 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14561 Do you have any specific reason or use case that have to refactor this part? IMHO, I think unless we have a concrete reason to change it, it is better not to do refactoring

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-09 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14557#discussion_r74017373 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1564,6 +1564,14 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark issue #14556: [SPARK-16966][Core] Make App Name to the valid name inst...

2016-08-09 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14556 Would you please add a unit test to verify the changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14542: [SPARK-16930][yarn] Fix a couple of races in clus...

2016-08-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14542#discussion_r73987834 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -404,7 +410,8 @@ private[spark] class ApplicationMaster

[GitHub] spark issue #14540: [SPARK-16950] [PySpark] fromOffsets parameter support in...

2016-08-08 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14540 Looks good to me, we should also change the unit tests accordingly. Currently several related tests are excluded from python3 test. --- If your project is set up for it, you can reply

[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...

2016-08-07 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14065 @vazin, looks like I missed that comment, I will address that today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...

2016-08-02 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14065 @vanzin , I did refactoring on some interfaces, especially for `obtainCredentials` method, and the implementation of `HDFSCredentialProvider` and `HiveCredential`, would you please help

[GitHub] spark pull request #14065: [SPARK-14743][YARN] Add a configurable credential...

2016-07-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r72752972 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/security/HDFSCredentialProvider.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed

[GitHub] spark pull request #14065: [SPARK-14743][YARN] Add a configurable credential...

2016-07-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r72750534 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/security/HDFSCredentialProviderSuite.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed

[GitHub] spark pull request #14065: [SPARK-14743][YARN] Add a configurable credential...

2016-07-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r72750018 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/security/CredentialUpdater.scala --- @@ -107,8 +110,16 @@ private[spark] class

[GitHub] spark pull request #14065: [SPARK-14743][YARN] Add a configurable credential...

2016-07-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r72749483 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/security/CredentialUpdater.scala --- @@ -41,16 +43,18 @@ private[spark] class

[GitHub] spark pull request #14065: [SPARK-14743][YARN] Add a configurable credential...

2016-07-29 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14065#discussion_r72747890 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/security/ConfigurableCredentialManager.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...

2016-07-29 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14340 Thanks a lot @rxin for your comments, let me close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14340: [SPARK-16534][Streaming][Kafka] Add Python API su...

2016-07-29 Thread jerryshao
Github user jerryshao closed the pull request at: https://github.com/apache/spark/pull/14340 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14312: [SPARK-15857]Add caller context in Spark: invoke YARN/HD...

2016-07-26 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14312 >Spark caller context written into Yarn log will be"{spark.app.name} running on Spark". This may not be so useful, I think we could get app name form yarn through many d

[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...

2016-07-26 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14312#discussion_r72198340 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -78,6 +79,12 @@ private[spark] abstract class Task[T]( metrics

[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...

2016-07-26 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14312#discussion_r72198063 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -66,6 +66,9 @@ private[spark] class Client( import Client

[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...

2016-07-26 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14312#discussion_r72197100 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -197,6 +197,9 @@ private[spark] class ApplicationMaster

[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...

2016-07-26 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14312#discussion_r72196752 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2418,6 +2419,25 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request #14312: [SPARK-15857]Add caller context in Spark: invoke ...

2016-07-26 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14312#discussion_r72196631 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2418,6 +2419,25 @@ private[spark] object Utils extends Logging

[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...

2016-07-26 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14340 Thanks a lot @koeninger for your review, I think it is not so flexible for Python API to achieve same functionalities as Java/Scala APIs, especially for things like extended class like

[GitHub] spark pull request #14340: [SPARK-16534][Streaming][Kafka] Add Python API su...

2016-07-26 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14340#discussion_r72192793 --- Diff: python/pyspark/streaming/kafka010.py --- @@ -0,0 +1,370 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #14340: [SPARK-16534][Streaming][Kafka] Add Python API su...

2016-07-25 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14340#discussion_r72177558 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaUtils.scala --- @@ -177,3 +182,172 @@ object KafkaUtils extends

<    11   12   13   14   15   16   17   18   19   20   >