[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r117205121 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +197,54 @@ final class ShuffleBlockFetcherIterator(

[GitHub] spark pull request #18027: [SPARK-20796] the location of start-master.sh in ...

2017-05-18 Thread liu-zhaokun
GitHub user liu-zhaokun opened a pull request: https://github.com/apache/spark/pull/18027 [SPARK-20796] the location of start-master.sh in spark-standalone.md is wrong [https://issues.apache.org/jira/browse/SPARK-20796](https://issues.apache.org/jira/browse/SPARK-20796) the loc

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16989 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16989 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77039/ Test FAILed. ---

[GitHub] spark pull request #18028: [DStream][DOC]Add documentation for kinesis retry...

2017-05-18 Thread yssharma
GitHub user yssharma opened a pull request: https://github.com/apache/spark/pull/18028 [DStream][DOC]Add documentation for kinesis retry configurations ## What changes were proposed in this pull request? The changes were merged as part of - https://github.com/apache/spark/p

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-18 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r117238414 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -609,6 +610,19 @@ object CollapseWindow extends Rule[Lo

[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...

2017-05-18 Thread yssharma
GitHub user yssharma opened a pull request: https://github.com/apache/spark/pull/18029 [SPARK-20168][WIP][DStream] Add changes to use kinesis fetches from specified timestamp ## What changes were proposed in this pull request? Kinesis client can resume from a specified time

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-18 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r117240492 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -423,4 +423,25 @@ class DataFrameWindowFunctionsSuite exte

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-18 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r117240910 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to t

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-18 Thread ptkool
Github user ptkool commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r117241321 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to t

[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18012 (let's see if jenkins picks this up) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena

[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18012 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...

2017-05-18 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/18030 @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the f

[GitHub] spark pull request #18030: [SPARK-20798] GenerateUnsafeProjection should che...

2017-05-18 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/18030 [SPARK-20798] GenerateUnsafeProjection should check if a value is null before calling the getter ## What changes were proposed in this pull request? GenerateUnsafeProjection.writeStructToBuffe

[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18030 LGTM - pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18030 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117250008 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(chil

[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...

2017-05-18 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17400 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wis

[GitHub] spark issue #13959: [SPARK-14351] [MLlib] [ML] Optimize findBestSplits metho...

2017-05-18 Thread MechCoder
Github user MechCoder commented on the issue: https://github.com/apache/spark/pull/13959 I don't understand. If you don't have time to review that is fine (I've been there too), but there is no need to close a PR due to unavailability of comitters. One of the reasons, that I

[GitHub] spark issue #13959: [SPARK-14351] [MLlib] [ML] Optimize findBestSplits metho...

2017-05-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13959 I think the problem is that this PR was incomplete, and left open. We generally only leave open PRs that are active. There was evidently no interest in proceeding with it; I don't know if it was lack

[GitHub] spark issue #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-18 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17996 @MLnick We have separate SparkR guide at http://spark.apache.org/docs/latest/sparkr.html . Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

2017-05-18 Thread davideis
Github user davideis commented on a diff in the pull request: https://github.com/apache/spark/pull/18022#discussion_r117260197 --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala --- @@ -78,7 +79,7 @@ class ALSSuite val k = 2 val ne0 =

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

2017-05-18 Thread davideis
Github user davideis commented on a diff in the pull request: https://github.com/apache/spark/pull/18022#discussion_r117261995 --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala --- @@ -348,6 +349,37 @@ class ALSSuite } /** + *

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

2017-05-18 Thread davideis
Github user davideis commented on a diff in the pull request: https://github.com/apache/spark/pull/18022#discussion_r117265969 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -1624,15 +1628,15 @@ object ALS extends DefaultParamsReadable[ALS] with

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

2017-05-18 Thread davideis
Github user davideis commented on a diff in the pull request: https://github.com/apache/spark/pull/18022#discussion_r117264333 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Log

[GitHub] spark pull request #18022: [SPARK-20790] [MLlib] Correctly handle negative v...

2017-05-18 Thread davideis
Github user davideis commented on a diff in the pull request: https://github.com/apache/spark/pull/18022#discussion_r117262532 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -763,11 +763,15 @@ object ALS extends DefaultParamsReadable[ALS] with L

[GitHub] spark issue #17738: [SPARK-20422][Spark Core] Worker registration retries sh...

2017-05-18 Thread navneetrastogi
Github user navneetrastogi commented on the issue: https://github.com/apache/spark/pull/17738 This is a minor change which will help in configuring registration retries. It can be merged. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #18027: [SPARK-20796] the location of start-master.sh in spark-s...

2017-05-18 Thread liu-zhaokun
Github user liu-zhaokun commented on the issue: https://github.com/apache/spark/pull/18027 @srowen I will try my best to slove this type of problem.But I only found one this time.Please help me to merge it.Thanks. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #18027: [SPARK-20796] the location of start-master.sh in spark-s...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18027 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #77043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77043/testReport)** for PR 18029 at commit [`7cf935b`](https://github.com/apache/spark/commit/7c

[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18030 **[Test build #77042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77042/testReport)** for PR 18030 at commit [`0e41a26`](https://github.com/apache/spark/commit/0e

[GitHub] spark issue #18028: [DStream][DOC]Add documentation for kinesis retry config...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18028 **[Test build #77044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77044/testReport)** for PR 18028 at commit [`9aa9f16`](https://github.com/apache/spark/commit/9a

[GitHub] spark issue #18026: [SPARK-16202][SQL][DOC] Follow-up to Correct The Descrip...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18026 **[Test build #77045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77045/testReport)** for PR 18026 at commit [`7bf5d19`](https://github.com/apache/spark/commit/7b

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18025 **[Test build #77046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77046/testReport)** for PR 18025 at commit [`5c8cd1e`](https://github.com/apache/spark/commit/5c

[GitHub] spark issue #18024: [SPARK-20792][SS] Support same timeout operations in map...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18024 **[Test build #77047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77047/testReport)** for PR 18024 at commit [`ca36419`](https://github.com/apache/spark/commit/ca

[GitHub] spark issue #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A]R.

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18019 **[Test build #77048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77048/testReport)** for PR 18019 at commit [`0fcd9d3`](https://github.com/apache/spark/commit/0f

[GitHub] spark issue #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18002 **[Test build #77050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77050/testReport)** for PR 18002 at commit [`66fefb6`](https://github.com/apache/spark/commit/66

[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. when...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17400 **[Test build #77053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77053/testReport)** for PR 17400 at commit [`49a1732`](https://github.com/apache/spark/commit/49

[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18012 **[Test build #77049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77049/testReport)** for PR 18012 at commit [`adacf2c`](https://github.com/apache/spark/commit/ad

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18000 **[Test build #77051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77051/testReport)** for PR 18000 at commit [`1eae64a`](https://github.com/apache/spark/commit/1e

[GitHub] spark issue #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17996 **[Test build #77052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77052/testReport)** for PR 17996 at commit [`e27d9e4`](https://github.com/apache/spark/commit/e2

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18025 **[Test build #77046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77046/testReport)** for PR 18025 at commit [`5c8cd1e`](https://github.com/apache/spark/commit/5

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12646 **[Test build #77055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77055/testReport)** for PR 12646 at commit [`11d5c10`](https://github.com/apache/spark/commit/11

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18025 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18025 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77046/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #77054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77054/testReport)** for PR 16677 at commit [`55ee6b0`](https://github.com/apache/spark/commit/55

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77043/ Test FAILed. ---

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #77043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77043/testReport)** for PR 18029 at commit [`7cf935b`](https://github.com/apache/spark/commit/7

[GitHub] spark pull request #17999: [SPARK-20751][SQL] Add built-in SQL Function - CO...

2017-05-18 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17999#discussion_r117278957 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1991,6 +1991,22 @@ object functions { def tan(columnName: String): Colum

[GitHub] spark issue #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17996 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17996 **[Test build #77052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77052/testReport)** for PR 17996 at commit [`e27d9e4`](https://github.com/apache/spark/commit/e

[GitHub] spark issue #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17996 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77052/ Test PASSed. ---

[GitHub] spark issue #18028: [DStream][DOC]Add documentation for kinesis retry config...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18028 **[Test build #77044 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77044/testReport)** for PR 18028 at commit [`9aa9f16`](https://github.com/apache/spark/commit/9

[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18012 **[Test build #77049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77049/testReport)** for PR 18012 at commit [`adacf2c`](https://github.com/apache/spark/commit/a

[GitHub] spark issue #18028: [DStream][DOC]Add documentation for kinesis retry config...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18028 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18028: [DStream][DOC]Add documentation for kinesis retry config...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18028 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77044/ Test PASSed. ---

[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18012 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18012 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77049/ Test PASSed. ---

[GitHub] spark issue #13959: [SPARK-14351] [MLlib] [ML] Optimize findBestSplits metho...

2017-05-18 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/13959 The lack of bandwidth in MLlib means that sometimes good code that would make an impact just gets ignored. This is kind of the reality of things. However, if we are going to close the PR simply becau

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @JoshRosen Thanks a lot for taking time looking into this pr. I'm reading your comments carefully. Yes, I think it's good to integrate with memory manager later. I will break this pr

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18031 Record accurate size of blocks in MapStatus when it's above threshold. ## What changes were proposed in this pull request? Currently, when number of reduces is above 2000, HighlyCompresse

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77056/testReport)** for PR 18031 at commit [`d5b8a21`](https://github.com/apache/spark/commit/d5

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 To resolve the comments in https://github.com/apache/spark/pull/16989 : >minimum size before we consider something a large block : if average is 10kb, and some blocks are > 20kb, spilling them

[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17999 **[Test build #77057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77057/testReport)** for PR 17999 at commit [`3bd1e1e`](https://github.com/apache/spark/commit/3b

[GitHub] spark issue #13959: [SPARK-14351] [MLlib] [ML] Optimize findBestSplits metho...

2017-05-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13959 True, and I'd probably close the JIRA too. Maybe we can draw @jkbradley 's attention for a comment? A closed PR still exists and can be examined or reopened, so it doesn't go away. I'd prefe

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 I try to give user a way to control the memory strictly and no blocks are underestimated(setting spark.shuffle.accurateBlockThreshold=0 and spark.shuffle.accurateBlockThresholdByTimesAverage=1). I

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117293425 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus {

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77056/testReport)** for PR 18031 at commit [`d5b8a21`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77056/ Test FAILed. ---

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18027: [SPARK-20796] the location of start-master.sh in spark-s...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18027 **[Test build #3722 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3722/testReport)** for PR 18027 at commit [`14f0ba9`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-18 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17996 looks good We do have a separate section but it's mostly for breaking/behavior changes at the R layer, if there are changes to ml that affects R also would be great to include here.

[GitHub] spark pull request #18012: [SPARK-20779][Examples]The ASF header placed in a...

2017-05-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18012 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

2017-05-18 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18025 so I think better example is great and we might be too verbose with individual pages for each function so might be a good idea to consolidate them, but one question, does this affect discoverabi

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and w

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77058/testReport)** for PR 18031 at commit [`d5b8a21`](https://github.com/apache/spark/commit/d5

[GitHub] spark issue #18027: [SPARK-20796] the location of start-master.sh in spark-s...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18027 **[Test build #3722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3722/testReport)** for PR 18027 at commit [`14f0ba9`](https://github.com/apache/spark/commit/

[GitHub] spark issue #18027: [SPARK-20796] the location of start-master.sh in spark-s...

2017-05-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18027 merged to master/2.2/2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 Gentle ping to @JoshRosen @cloud-fan @mridulm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fe

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-18 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18000 LGTM pending Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #17999: [SPARK-20751][SQL] Add built-in SQL Function - CO...

2017-05-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17999#discussion_r117298560 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1991,6 +1991,14 @@ object functions { def tan(columnName: String): C

[GitHub] spark pull request #18027: [SPARK-20796] the location of start-master.sh in ...

2017-05-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18027 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request #17999: [SPARK-20751][SQL] Add built-in SQL Function - CO...

2017-05-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17999#discussion_r117298713 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala --- @@ -274,6 +274,13 @@ class MathFunctionsSuite extends QueryTest with

[GitHub] spark pull request #17999: [SPARK-20751][SQL] Add built-in SQL Function - CO...

2017-05-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17999#discussion_r117298871 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -32,6 +32,7 @@ select 1 - 2; select 2 * 5; select 5 % 3; select

[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18002#discussion_r117299631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala --- @@ -53,219 +53,299 @@ private[columnar] sealed trait Colu

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18031 **[Test build #77058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77058/testReport)** for PR 18031 at commit [`d5b8a21`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77058/ Test FAILed. ---

[GitHub] spark pull request #17999: [SPARK-20751][SQL] Add built-in SQL Function - CO...

2017-05-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17999#discussion_r117300361 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -544,6 +544,24 @@ case class Sqrt(child: Ex

[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT

2017-05-18 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17999 Thanks for working on it! LGTM except a few minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proje

[GitHub] spark pull request #18030: [SPARK-20798] GenerateUnsafeProjection should che...

2017-05-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18030#discussion_r117301990 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -50,10 +50,15 @@ object Ge

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

2017-05-18 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17723 @jerryshao Are those providers different than the Hive and HBase providers already in the Spark codebase? Regardless, with what I'm proposing, the `yarn.ServiceCredentialProvider` would re

[GitHub] spark issue #17747: [SPARK-11373] [CORE] Add metrics to the FsHistoryProvide...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17747 **[Test build #77059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77059/testReport)** for PR 17747 at commit [`64df0e1`](https://github.com/apache/spark/commit/64

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

2017-05-18 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17723 I'm fine with that if you're ok with Mesos being restricted to the built-in providers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18000 **[Test build #77051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77051/testReport)** for PR 18000 at commit [`1eae64a`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18000 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18000 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77051/ Test PASSed. ---

[GitHub] spark issue #18026: [SPARK-16202][SQL][DOC] Follow-up to Correct The Descrip...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18026 **[Test build #77045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77045/testReport)** for PR 18026 at commit [`7bf5d19`](https://github.com/apache/spark/commit/7

[GitHub] spark issue #18026: [SPARK-16202][SQL][DOC] Follow-up to Correct The Descrip...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77045/ Test PASSed. ---

<    1   2   3   4   5   >