[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-07 Thread XuTingjun
Github user XuTingjun closed the pull request at: https://github.com/apache/spark/pull/3806 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is e

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-07 Thread XuTingjun
Github user XuTingjun closed the pull request at: https://github.com/apache/spark/pull/3686 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is e

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-07 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-69111548 Hi @XuTingjun mind closing this issue then? It's confusing to have both open. --- If your project is set up for it, you can reply to this email and have your reply ap

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-07 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-69095103 Hi @XuTingjun there seems to be a certain degree of overlap with work in #3607. Also, my concern with this PR is that it conflates "driver" and "AM" in a few places. F

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3806#discussion_r22616963 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -43,8 +44,13 @@ private[spark] class ClientArguments(args: Array[S

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-05 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68822983 Sorry but I already file a PR that splits spark.driver.memory in https://github.com/apache/spark/pull/3607. Could you please check if anyone already did t

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-04 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68676616 @sryza, I have splited spark.driver.memory into spark.driver.memory and spark.yarn.am.memory. Please have a look. --- If your project is set up for it, you can reply t

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-04 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68673436 Yearh, I agree with you. Later I will fix this. Thanks @sryza --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-04 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68673313 I think the best thing would be to split spark.driver.memory into spark.driver.memory and spark.yarn.am.memory, and to have the latter only work for the yarn-client AM. -

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-04 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68672836 @sryza, do you mean "spark.driver.memory" works in yarn-client and yarn-cluster mode, so we should use one configuration maybe named "spark.driver.cores" to set am core

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-04 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68669501 @XuTingjun ah, that's correct. Looking more closely, my confusion was stemming from some existing weirdness, which is that setting the "spark.driver.memory" property will

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-04 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3806#discussion_r22447830 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -70,6 +70,8 @@ private[spark] class YarnClientSchedule

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-04 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3806#discussion_r22447819 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -120,6 +121,13 @@ private[spark] class ClientArguments(args: Array[Stri

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2015-01-04 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-6857 @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68125091 @sryza, I am not agree with you. I only add the below code into cluster mode. So the "--driver-cores" will not work in client mode. OptionAssigner(args.driverCores

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68122279 SPARK_MASTER_CORES uses "master" incorrectly. The only reason we have a SPARK_MASTER_MEMORY was to preserve backwards capability. This patch also still appears to

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3806#issuecomment-68119304 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread XuTingjun
Github user XuTingjun closed the pull request at: https://github.com/apache/spark/pull/3799 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is e

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3799#issuecomment-68093631 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-68093593 Hi all, I accidently delete my repository, so I create a new patch #3799 for it. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread XuTingjun
GitHub user XuTingjun opened a pull request: https://github.com/apache/spark/pull/3799 [SPARK-1507][YARN]specify num of cores for AM I add some configurations below. spark.yarn.am.cores/SPARK_MASTER_CORES/SPARK_DRIVER_CORES for yarn-client mode; spark.driver.cores for yarn-c

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-22 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67881296 Yes, I agree. Because the driver runs in the same JVM as the AM in cluster mode, we don't want to overload `spark.yarn.am.*` and the corresponding `spark.driver.*`. Th

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67876245 So, after actually reading the code :-), the current implementation uses `spark.yarn.am.cores` for both client and cluster mode. I think that's bad, because if som

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-22 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67869210 It says it's standalone mode only because it's never been implemented anywhere else. You're now implementing it for Yarn, I don't see a reason why you wouldn't just reuse

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-21 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67810373 sorry, I think this patch works in yarn-client and yarn-cluster mode. The param "--driver-cores" is standalone cluster only. am I missing something? --- If your proje

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-19 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67722108 So this is only for client mode right? Can you document it? Also, what happens if people set this in cluster mode? Is there not an existing mechanism to set the driver

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-19 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67689276 Adding a parameter for the yarn-client AM cores sounds reasonable to me. As I think I've voiced on other JIRAs, using the `spark.yarn.am.*` namespace to refer to the clien

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-19 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67616013 @sryza, thanks for you comments, is there still any thoughts or objections to this? --- If your project is set up for it, you can reply to this email and have your rep

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-15 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-67003632 @sryza I understand what you are saying but I don't see anywhere in this pull request that the yarn-client AM is referred to as the driver, conf in current code is spar

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-66937888 @tgravescs In all other places we've managed to avoid referring to the yarn-client AM as the "driver" and I think blurring this distinction would be pretty confusing.

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-12 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-66866834 I have tested, it works in yarn-client and yarn-cluster mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-12 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-66817042 spark-submit lists `--driver-cores` as an option for standalone cluster mode. I think this should expand the use of that option to apply for yarn-cluster mode too. Less cl

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-12 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-66813046 I don't see any harm in allowing it to work in both modes, do you have a concern with client mode? --- If your project is set up for it, you can reply to this email an

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-12 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-66809815 This should only apply for yarn-cluster mode, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-12 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-66776483 @scwf There are many different reasons one might want to specify the cores for the AM. It mostly applies when running in yarn-cluster mode and your driver is on th

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-12 Thread XuTingjun
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-66774800 @tgravescs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en