[GitHub] spark pull request: [SPARK-3536][SQL] SELECT on empty parquet tabl...

2014-09-19 Thread ravipesala
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2456 [SPARK-3536][SQL] SELECT on empty parquet table throws exception It return null metadata from parquet if querying on empty parquet file while calculating splits.So added null check and returns

[GitHub] spark pull request: [SPARK-2062][GraphX] VertexRDD.apply does not ...

2014-09-19 Thread ankurdave
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/1903#issuecomment-56140430 Thanks! Merged into master and branch-1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2062][GraphX] VertexRDD.apply does not ...

2014-09-19 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1903 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-19 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-56142506 @sryza Thanks Sandy. Will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-56144570 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-56144582 this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56147622 @davies Does `PickleSerializer` compress data? If not, maybe we should cache the deserialized RDD instead of the one from `_.reserialize`. They have the same storage. I

[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

2014-09-19 Thread ankurdave
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/2446#issuecomment-56151121 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3268][SQL] DoubleType, FloatType and De...

2014-09-19 Thread gvramana
GitHub user gvramana opened a pull request: https://github.com/apache/spark/pull/2457 [SPARK-3268][SQL] DoubleType, FloatType and DecimalType modulus support Supported modulus operation using % operator on fractional datatypes FloatType, DoubleType and DecimalType Example:

Re: [GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-19 Thread Sean Owen
Hm deleteOnExit should at least not hurt and I thought it will delete dirs if they are empty, which may be so if temp files inside never existed or were cleaned up themselves. But yeah always delete explicitly in the normal execution path even in the event of normal exceptions. On Sep 19, 2014

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-56152819 Sorry for asking - but have you tested this on a real cluster? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-56152843 Oh and thanks for doing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3578] Fix upper bound in GraphGenerator...

2014-09-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2439#issuecomment-56153581 @jegonzal you should take a look :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3536][SQL] SELECT on empty parquet tabl...

2014-09-19 Thread ravipesala
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2456#issuecomment-56157072 Please review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3598][SQL]cast to timestamp should be t...

2014-09-19 Thread adrian-wang
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/2458 [SPARK-3598][SQL]cast to timestamp should be the same as hive this patch fixes timestamp smaller than 0 and cast int as timestamp select cast(1000 as timestamp) from src limit 1;

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-09-19 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-56164031 Hi @liyezhang556520 , thanks for pointing this out! I have updated my PR, please review @andrewor14 --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-09-19 Thread liyezhang556520
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/791#discussion_r17781184 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -239,18 +250,18 @@ private[spark] class MemoryStore(blockManager:

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2444#issuecomment-56172067 +1 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17782052 --- Diff: sbin/slaves.sh --- @@ -67,20 +69,26 @@ fi if [ $HOSTLIST = ]; then if [ $SPARK_SLAVES = ]; then -export

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-19 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-56174217 @mridulm any comments? I'm ok with it if its a consistent problem for users. One thing we definitely need to do is document it and possibly look at including

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2350#discussion_r17785127 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -37,154 +36,106 @@ import

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2350#discussion_r17786319 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -415,41 +381,153 @@ trait ClientBase extends Logging {

[GitHub] spark pull request: [MLLib] Fix example code variable name misspel...

2014-09-19 Thread rnowling
GitHub user rnowling opened a pull request: https://github.com/apache/spark/pull/2459 [MLLib] Fix example code variable name misspelling in MLLib Feature Extraction guide You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2350#discussion_r17786722 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala --- @@ -19,29 +19,24 @@ package

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-19 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2350#issuecomment-56183509 This mostly looks good. A couple minor comments is all. I do also still want to run through some tests on alpha. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3578] Fix upper bound in GraphGenerator...

2014-09-19 Thread rnowling
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/2439#issuecomment-56183914 @ankurdave I'd be a bit concerned about how that affects the correctness of the algorithm. Especially since this will round every value down when maybe you only one to

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-19 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-56184849 I did indeed test it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-19 Thread gss2002
Github user gss2002 commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-56185334 We have been using this fix for a few weeks now against Hive 13. The only outstanding issue I see and this could be something larger is the fact that Spark Thrift

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2350#discussion_r17788554 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -415,41 +381,153 @@ trait ClientBase extends Logging {

[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...

2014-09-19 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/1297#discussion_r17790219 --- Diff: core/src/main/scala/org/apache/spark/rdd/IndexedRDDLike.scala --- @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...

2014-09-19 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/1297#discussion_r17791303 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ImmutableLongOpenHashSet.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-19 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56197446 for some additional input, @pwendell - do you think requiring numpy for core would be acceptable? --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: SPARK-3580: New public method for RDD's to hav...

2014-09-19 Thread patmcdonough
Github user patmcdonough commented on a diff in the pull request: https://github.com/apache/spark/pull/2447#discussion_r17794069 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -208,6 +208,23 @@ abstract class RDD[T: ClassTag]( } /** +

[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...

2014-09-19 Thread squito
Github user squito commented on the pull request: https://github.com/apache/spark/pull/1297#issuecomment-56199798 This looks great! my comments are minor. I know its early to be discussing example docs, but I just wanted to mention that I can see caching being an area of

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56202513 @anantasty: If you could look through the code and mark places where you're like What the heck is going on here, it would be easier for me to write up proper comments.

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-19 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56206573 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-56206744 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3536][SQL] SELECT on empty parquet tabl...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2456#issuecomment-56206732 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3268][SQL] DoubleType, FloatType and De...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2457#issuecomment-56206727 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2337#discussion_r17796741 --- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala --- @@ -83,6 +83,15 @@ trait FutureAction[T] extends Future[T] { */

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-56206738 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2337#discussion_r17796804 --- Diff: core/src/test/scala/org/apache/spark/FutureActionSuite.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-2098] All Spark processes should suppor...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2379#issuecomment-56207066 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20573/consoleFull) for PR 2379 at commit

[GitHub] spark pull request: [SPARK-3377] [Metrics] Metrics can be accident...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-56207059 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20575/consoleFull) for PR 2432 at commit

[GitHub] spark pull request: [SPARK-3599]Avoid loaing properties file frequ...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2454#issuecomment-56207072 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20570/consoleFull) for PR 2454 at commit

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56207056 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20572/consoleFull) for PR 2440 at commit

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56207061 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20574/consoleFull) for PR 2401 at commit

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56207099 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20576/consoleFull) for PR 2378 at commit

[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2446#issuecomment-56207128 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20571/consoleFull) for PR 2446 at commit

[GitHub] spark pull request: [SPARK-2098] All Spark processes should suppor...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2379#issuecomment-56207390 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20573/consoleFull) for PR 2379 at commit

[GitHub] spark pull request: [SPARK-3598][SQL]cast to timestamp should be t...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2458#issuecomment-56207035 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20569/consoleFull) for PR 2458 at commit

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2337#issuecomment-56207872 It would be good to test the complex case with multiple job ids, but overall looks good. @rxin you added this interface - can you take a look (this is a very small

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2337#discussion_r17797122 --- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala --- @@ -171,6 +179,8 @@ class ComplexFutureAction[T] extends FutureAction[T] { // is

[GitHub] spark pull request: SPARK-3580: New public method for RDD's to hav...

2014-09-19 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2447#issuecomment-56208207 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-3580: New public method for RDD's to hav...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2447#issuecomment-56208954 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20577/consoleFull) for PR 2447 at commit

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56210084 @mengxr PickleSerializer do not compress data, there is CompressSerializer can do it using gzip(level 1). Compression can help for small range of double or repeated

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56211052 @mengxr In this PR, I just tried to avoid other changes except serialization, we could change the cache behavior or compression later. It's will be good to have

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2350#discussion_r17798663 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala --- @@ -19,29 +19,24 @@ package

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-19 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56211203 LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2350#discussion_r17798689 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -415,41 +381,153 @@ trait ClientBase extends Logging {

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17798769 --- Diff: sbin/slaves.sh --- @@ -67,20 +69,26 @@ fi if [ $HOSTLIST = ]; then if [ $SPARK_SLAVES = ]; then -export

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-09-19 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/151 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1793 - Heavily duplicated test setup cod...

2014-09-19 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/726 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17799143 --- Diff: sbin/slaves.sh --- @@ -67,20 +69,26 @@ fi if [ $HOSTLIST = ]; then if [ $SPARK_SLAVES = ]; then -export

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17799162 --- Diff: .gitignore --- @@ -19,6 +19,7 @@ conf/*.sh conf/*.properties conf/*.conf conf/*.xml +conf/slaves --- End diff --

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2444#issuecomment-56212658 Made some comments. We need to guard this with a config parameter because otherwise it will regress behavior on large clusters where serial vs parallel ssh makes a big

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-19 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-56212828 This patch does not include thrift patch, which will be fixed by other jiras, because I don't want the scope is too big. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3595] Respect configured OutputCommitte...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2450#discussion_r17799318 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -872,7 +872,12 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2350#discussion_r17799322 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -37,154 +36,106 @@ import

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56213954 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20572/consoleFull) for PR 2440 at commit

[GitHub] spark pull request: [SPARK-3598][SQL]cast to timestamp should be t...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2458#issuecomment-56214025 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20569/consoleFull) for PR 2458 at commit

[GitHub] spark pull request: [SPARK-3595] Respect configured OutputCommitte...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2450#discussion_r17799792 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -478,6 +482,15 @@ class PairRDDFunctionsSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-3595] Respect configured OutputCommitte...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2450#discussion_r17799890 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -478,6 +482,15 @@ class PairRDDFunctionsSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-19 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-56214417 There is a related PR #1940 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-19 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-56214347 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3595] Respect configured OutputCommitte...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2450#discussion_r17800209 --- Diff: examples/src/main/scala/org/apache/spark/examples/AwsTest.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3595] Respect configured OutputCommitte...

2014-09-19 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2450#issuecomment-56214986 Thanks for sending this. The approach seems solid. I made some small comments in a few places. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: SPARK-3605. Fix typo in SchemaRDD.

2014-09-19 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2460 SPARK-3605. Fix typo in SchemaRDD. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-3605 Alternatively you can review and

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/2337#discussion_r17800470 --- Diff: core/src/test/scala/org/apache/spark/FutureActionSuite.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: SPARK-3605. Fix typo in SchemaRDD.

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2460#issuecomment-56215386 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20580/consoleFull) for PR 2460 at commit

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56215397 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20581/consoleFull) for PR 2440 at commit

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/2337#discussion_r17800549 --- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala --- @@ -171,6 +179,8 @@ class ComplexFutureAction[T] extends FutureAction[T] { // is

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800692 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800664 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param

[GitHub] spark pull request: [Docs] Fix outdated docs for standalone cluste...

2014-09-19 Thread andrewor14
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/2461 [Docs] Fix outdated docs for standalone cluster This is now supported! You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800699 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800687 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable {

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17800735 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param

[GitHub] spark pull request: [MLLib] Fix example code variable name misspel...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2459#issuecomment-56216041 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20568/consoleFull) for PR 2459 at commit

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2337#issuecomment-56216044 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20583/consoleFull) for PR 2337 at commit

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-19 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56216271 So, I'm a little disappointed that this doesn't at least follow the Yarn model of one setting that defines the overhead. Instead, it has two settings, one for a fraction

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801072 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -37,11 +44,197 @@ trait Matrix extends Serializable {

[GitHub] spark pull request: [SPARK-3599]Avoid loaing properties file frequ...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2454#issuecomment-56216397 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20570/consoleFull) for PR 2454 at commit

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56216573 Could the methods be ordered in the file (grouped by public, private[mllib], private, etc.? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17801264 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -57,13 +250,709 @@ trait Matrix extends Serializable { * @param

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56216806 Also, is it odd that the user can't access the matrix data, except via toArray (or maybe side effects of the function given to map)? --- If your project is set up for

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56216747 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20574/consoleFull) for PR 2401 at commit

[GitHub] spark pull request: [SPARK-3377] [Metrics] Metrics can be accident...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2432#issuecomment-56216738 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20575/consoleFull) for PR 2432 at commit

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56216817 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20576/consoleFull) for PR 2378 at commit

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-19 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56216904 I thought there was some desire to have the same thing also #1391? Furthermore, from my experience writing frameworks, I think a much better model is the

  1   2   3   4   >