date:20170922

[GitHub] spark pull request #19321: [SPARK-22100] [SQL] Make percentile_approx suppor...

2017-09-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19321#discussion_r140627874 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala --- @@ -85,7 +85,8 @@ case class A

[GitHub] spark issue #19330: Orderable MapType

2017-09-22 Thread jinxing64

Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19330 It seems https://github.com/apache/spark/pull/15970 is not being worked. I resolved conflicts and add some tests in this pr. --- -

[GitHub] spark issue #19330: Orderable MapType

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19330 **[Test build #82106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82106/testReport)** for PR 19330 at commit [`2e2b98d`](https://github.com/apache/spark/commit/2e

[GitHub] spark pull request #19330: Orderable MapType

2017-09-22 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19330#discussion_r140627825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -663,6 +663,18 @@ class CodegenContext

[GitHub] spark pull request #19330: Orderable MapType

2017-09-22 Thread jinxing64

GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/19330 Orderable MapType ## What changes were proposed in this pull request? We can make MapType orderable, and thus usable in aggregates and joins. ## How was this patch tested?

[GitHub] spark issue #19320: [SPARK-22099] The 'job ids' list style needs to be chang...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19320 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82103/ Test PASSed. ---

[GitHub] spark issue #19320: [SPARK-22099] The 'job ids' list style needs to be chang...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19320 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19320: [SPARK-22099] The 'job ids' list style needs to be chang...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19320 **[Test build #82103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82103/testReport)** for PR 19320 at commit [`5cb6ea4`](https://github.com/apache/spark/commit/5

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19321 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19321 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82101/ Test FAILed. ---

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19321 **[Test build #82101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82101/testReport)** for PR 19321 at commit [`45e655f`](https://github.com/apache/spark/commit/4

[GitHub] spark issue #19329: [SPARK-22110][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19329 **[Test build #82105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82105/testReport)** for PR 19329 at commit [`0f3307d`](https://github.com/apache/spark/commit/0f

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-m

[GitHub] spark issue #19329: [SPARK-22110][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread gatorsmile

Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19329 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@sp

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82102/ Test FAILed. ---

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #82102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82102/consoleFull)** for PR 9207 at commit [`9cb8994`](https://github.com/apache/spark/commit/9c

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626955 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -51,10 +51,12 @@ case class ArrowEvalPythonExec(udf

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread ueshin

Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626292 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -51,10 +51,12 @@ case class ArrowEvalPythonExec(udf

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/19325 Most looks pretty good. Only main question I have is about the empty partition issue. --- - To unsubscribe, e-mail: reviews-unsub

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626184 --- Diff: python/pyspark/sql/tests.py --- @@ -3344,6 +3342,22 @@ def test_vectorized_udf_wrong_return_type(self): 'Invalid.*type.*str

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626154 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -51,10 +51,12 @@ case class ArrowEvalPythonExec(udf

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140626051 --- Diff: python/pyspark/worker.py --- @@ -80,14 +77,12 @@ def wrap_pandas_udf(f, return_type): arrow_return_type = toArrowType(return_type)

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140625992 --- Diff: python/pyspark/sql/tests.py --- @@ -3344,6 +3342,22 @@ def test_vectorized_udf_wrong_return_type(self): 'Invalid.*type.*str

[GitHub] spark issue #19329: [SPARK-22110][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19329 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19329: [SPARK-22110][SQL][Documentation] Add usage and i...

2017-09-22 Thread kevinyu98

GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/19329 [SPARK-22110][SQL][Documentation] Add usage and improve documentation with arguments and examples for trim function ## What changes were proposed in this pull request? This PR proposes t

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #82104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82104/testReport)** for PR 13599 at commit [`abdf7b7`](https://github.com/apache/spark/commit/ab

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/19328 Why you close it? You can just edit the PR title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additiona

[GitHub] spark pull request #19328: [SPARK-22088][SQL][Documentation] Add usage and i...

2017-09-22 Thread kevinyu98

Github user kevinyu98 closed the pull request at: https://github.com/apache/spark/pull/19328 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread kevinyu98

Github user kevinyu98 commented on the issue: https://github.com/apache/spark/pull/19328 I am so sorry that I made mistake on the jira number, I create a new jira SPARK-22110, but I used the wrong number, let me close this PR, then put correct jira number. --- -

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19317 Oh I get your point. This is different from `RDD.aggregate`, it directly return Map and avoid shuffling. it seems useful when numKeys is small. But, I check the final `reduce` step, it seems

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82100/ Test FAILed. ---

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #82100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82100/testReport)** for PR 19222 at commit [`66bfbfc`](https://github.com/apache/spark/commit/6

[GitHub] spark issue #13143: [SPARK-15359] [Mesos] Mesos dispatcher should handle DRI...

2017-09-22 Thread ArtRand

Github user ArtRand commented on the issue: https://github.com/apache/spark/pull/13143 Hello @devaraj-kavali. Yes. I've been playing around with this because it's inconvenient to clean up ZK whenever you uninstall/reinstall the Dispatcher. The problem is that the only signal of a re-i

[GitHub] spark issue #19320: [SPARK-22099] The 'job ids' list style needs to be chang...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19320 **[Test build #82103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82103/testReport)** for PR 19320 at commit [`5cb6ea4`](https://github.com/apache/spark/commit/5c

[GitHub] spark issue #19320: [SPARK-22099] The 'job ids' list style needs to be chang...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19320 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@s

[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...

2017-09-22 Thread ArtRand

Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r140625136 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCredentialRenewer.scala --- @@ -0,0 +1,150 @@ +/* +

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #82102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82102/consoleFull)** for PR 9207 at commit [`9cb8994`](https://github.com/apache/spark/commit/9cb

[GitHub] spark issue #19326: [SPARK-22107] Change as to alias in python quickstart

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19326 How does it relate to https://github.com/apache/spark/pull/19283? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apa

[GitHub] spark pull request #19303: [SPARK-22085][CORE]When the application has no co...

2017-09-22 Thread 10110346

Github user 10110346 closed the pull request at: https://github.com/apache/spark/pull/19303 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82098/ Test PASSed. ---

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19328 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19328 **[Test build #82098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82098/testReport)** for PR 19328 at commit [`0f3307d`](https://github.com/apache/spark/commit/0

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19321 **[Test build #82101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82101/testReport)** for PR 19321 at commit [`45e655f`](https://github.com/apache/spark/commit/45

[GitHub] spark issue #19295: [SPARK-22080][SQL] Adds support for allowing user to add...

2017-09-22 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19295 ping @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: re

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19321 **[Test build #82099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82099/testReport)** for PR 19321 at commit [`db2c110`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19321 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82099/ Test FAILed. ---

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19321 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19295: [SPARK-22080][SQL] Adds support for allowing user to add...

2017-09-22 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19295 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.a

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19325 To me, I am willing to merge this one soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140622826 --- Diff: python/pyspark/serializers.py --- @@ -246,15 +243,9 @@ def cast_series(s, t): def loads(self, obj): """ Dese

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623893 --- Diff: python/pyspark/worker.py --- @@ -80,14 +77,12 @@ def wrap_pandas_udf(f, return_type): arrow_return_type = toArrowType(return_type)

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623560 --- Diff: python/pyspark/sql/functions.py --- @@ -2183,14 +2183,29 @@ def pandas_udf(f=None, returnType=StringType()): :param f: python function

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140624294 --- Diff: python/pyspark/sql/functions.py --- @@ -2183,14 +2183,29 @@ def pandas_udf(f=None, returnType=StringType()): :param f: python function

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623030 --- Diff: python/pyspark/sql/tests.py --- @@ -3256,11 +3256,9 @@ def test_vectorized_udf_null_string(self): def test_vectorized_udf_zero_p

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623282 --- Diff: python/pyspark/sql/tests.py --- @@ -3308,12 +3306,12 @@ def test_vectorized_udf_invalid_length(self): from pyspark.sql.functions i

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140623236 --- Diff: python/pyspark/sql/tests.py --- @@ -3308,12 +3306,12 @@ def test_vectorized_udf_invalid_length(self): from pyspark.sql.functions i

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy

Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140624191 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala --- @@ -28,12 +28,18 @@ class SparkOptimizer( experimentalMetho

[GitHub] spark issue #19295: [SPARK-22080][SQL] Adds support for allowing user to add...

2017-09-22 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19295 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@s

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy

Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140624028 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala --- @@ -28,12 +28,18 @@ class SparkOptimizer( experimentalMetho

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy

Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140623965 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala --- @@ -44,11 +44,14 @@ class ExperimentalMethods private[sql]() { */

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy

Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140624052 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala --- @@ -78,8 +82,14 @@ class SQLContextSuite extends SparkFunSuite with SharedSp

[GitHub] spark pull request #19295: [SPARK-22080][SQL] Adds support for allowing user...

2017-09-22 Thread wzhfy

Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19295#discussion_r140623955 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala --- @@ -28,12 +28,18 @@ class SparkOptimizer( experimentalMetho

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread ConeyLiu

Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/19317 @jiangxb1987 ,@WeichenXu123, thanks for your reviewing. This change is inspired by the `TODO List`. You can see the follow code snippet: ```scala // TODO: Calling aggregateByKey and coll

[GitHub] spark issue #18936: [SPARK-21688][ML][MLLIB] make native BLAS the first choi...

2017-09-22 Thread VinceShieh

Github user VinceShieh commented on the issue: https://github.com/apache/spark/pull/18936 Hi Sean, sorry for late reply. Yeah, actually we do have some performance data on F2J vs. OpenBLAS. It seems there is no performance gain from openblas, not even on the unit test level. We are th

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19317 Yes. I guess the perf gain is because, this PR use local hashmap which can use unlimited memory, but current spark aggregation impl, will auto spill local hashmap when exceeding a threshold.

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread VinceShieh

Github user VinceShieh commented on the issue: https://github.com/apache/spark/pull/19317 Nice catch. thanks. the perf gain is truly narrow. I believe this impl just tried to align with the impl of 'reduceByKeyLocally'. @ConeyLiu maybe we should revisit the code, along with the

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #82100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82100/testReport)** for PR 19222 at commit [`66bfbfc`](https://github.com/apache/spark/commit/66

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-09-22 Thread kiszk

Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19222 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: rev

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82097/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #82097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82097/testReport)** for PR 16578 at commit [`9fac482`](https://github.com/apache/spark/commit/9

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140622457 --- Diff: python/pyspark/sql/functions.py --- @@ -2183,14 +2183,29 @@ def pandas_udf(f=None, returnType=StringType()): :param f: python function

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

2017-09-22 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19317 And I have to point out that your impl have high risk causing OOM. The current impl will auto spill when local hashmap is too large and can take advantage of spark auto memory management mechan

[GitHub] spark pull request #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter p...

2017-09-22 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140622410 --- Diff: python/pyspark/sql/functions.py --- @@ -2183,14 +2183,29 @@ def pandas_udf(f=None, returnType=StringType()): :param f: python function if u

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19328 Yea .. let's open a separate JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-22 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 @viirya Yeah the perf gap I only focus on `mean` which can take advantage of codegen. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidat...

2017-09-22 Thread asfgit

Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19278 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-22 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Yeah, I think that fix should work for the strategy `Imputer.mean` because `Imputer.mean` aggregates many columns at once now and that can be a too large gen'd code for aggregation. For the

[GitHub] spark issue #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidationSpli...

2017-09-22 Thread jkbradley

Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19278 LGTM Merging with master Thanks @WeichenXu123 for the fix and for testing for backwards compatibility! --- - To unsubs

[GitHub] spark pull request #19278: [SPARK-22060][ML] Fix CrossValidator/TrainValidat...

2017-09-22 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19278#discussion_r140621871 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala --- @@ -160,11 +160,13 @@ class TrainValidationSplitSuite

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-09-22 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r140621444 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends LDAOp

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-22 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/19286 ping @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/19328 I think you should create a new JIRA, instead of using SPARK-22088 which is for the wrong style issue. --- - To unsubscribe, e-ma

[GitHub] spark issue #19321: [SPARK-22100] [SQL] Make percentile_approx support numer...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19321 **[Test build #82099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82099/testReport)** for PR 19321 at commit [`db2c110`](https://github.com/apache/spark/commit/db

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19325 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19325 **[Test build #82096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82096/testReport)** for PR 19325 at commit [`7b0da10`](https://github.com/apache/spark/commit/7

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19325 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82096/ Test FAILed. ---

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19328 **[Test build #82098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82098/testReport)** for PR 19328 at commit [`0f3307d`](https://github.com/apache/spark/commit/0f

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19328 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@s

[GitHub] spark issue #19328: [SPARK-22088][SQL][Documentation] Add usage and improve ...

2017-09-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19328 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19328: [SPARK-22088][SQL][Documentation] Add usage and i...

2017-09-22 Thread kevinyu98

GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/19328 [SPARK-22088][SQL][Documentation] Add usage and improve documentation with arguments and examples for trim function ## What changes were proposed in this pull request? This PR proposes t

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas

Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140605142 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -146,7 +146,13 @@ case class StreamingS

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas

Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140608418 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class SymmetricH

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas

Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140615004 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class SymmetricH

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas

Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140616618 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -216,22 +229,51 @@ case class Streaming

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas

Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140614146 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class SymmetricH

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas

Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140605854 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -324,17 +367,34 @@ case class Streaming

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas

Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140617927 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala --- @@ -216,22 +229,51 @@ case class Streaming

[GitHub] spark pull request #19327: [WIP] Implement stream-stream outer joins.

2017-09-22 Thread tdas

Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r140614325 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -89,61 +89,124 @@ class SymmetricH

1 2 3 4 >

1 - 100 of 343 matches

Mail list logo