date:20210205

[GitHub] [spark] SparkQA commented on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly

2021-02-05 Thread GitBox

SparkQA commented on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-773869378 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on a change in pull request #30902: [SPARK-33888][SQL] JDBC SQL TIME type represents incorrectly as TimestampType, it should be physical Int in millis

2021-02-05 Thread GitBox

cloud-fan commented on a change in pull request #30902: URL: https://github.com/apache/spark/pull/30902#discussion_r569992961 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -408,6 +421,23 @@ object JdbcUtils extends

[GitHub] [spark] AmplabJenkins commented on pull request #31484: [SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or values from a Map

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31484: URL: https://github.com/apache/spark/pull/31484#issuecomment-773897293 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39499/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773081377 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA removed a comment on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773031370 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #31384: [SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers

2021-02-05 Thread GitBox

SparkQA commented on pull request #31384: URL: https://github.com/apache/spark/pull/31384#issuecomment-773389086 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] Ngone51 commented on a change in pull request #31451: [WIP][SPARK-34338][SQL] Report metrics from Datasource v2 scan

2021-02-05 Thread GitBox

Ngone51 commented on a change in pull request #31451: URL: https://github.com/apache/spark/pull/31451#discussion_r570234278 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala ## @@ -44,8 +44,19 @@ case class BatchScanExec(

[GitHub] [spark] maropu commented on pull request #31449: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

2021-02-05 Thread GitBox

maropu commented on pull request #31449: URL: https://github.com/apache/spark/pull/31449#issuecomment-773705201 Thanks~, @HeartSaVioR This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] SparkQA removed a comment on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-773187923 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-02-05 Thread GitBox

SparkQA commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-773900126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] Ngone51 commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join

2021-02-05 Thread GitBox

Ngone51 commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-773151888 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] SparkQA removed a comment on pull request #31466: [WIP][SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31466: URL: https://github.com/apache/spark/pull/31466#issuecomment-773016444 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #31437: [SPARK-34329][YARN] When hit ApplicationAttemptNotFoundException, we can't just stop app for all case

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31437: URL: https://github.com/apache/spark/pull/31437#issuecomment-773395117 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] xkrogen commented on pull request #31133: [SPARK-26836][SQL] Supporting Avro schema evolution for partitioned Hive tables

2021-02-05 Thread GitBox

xkrogen commented on pull request #31133: URL: https://github.com/apache/spark/pull/31133#issuecomment-773571759 Thanks for the clarification @dongjoon-hyun ! I understand your concern now. New plan sounds good to me as well.

[GitHub] [spark] sririshindra commented on a change in pull request #31477: [SPARK-34369][SQL][WEBUI] Track number of pairs processed out of Join.

2021-02-05 Thread GitBox

sririshindra commented on a change in pull request #31477: URL: https://github.com/apache/spark/pull/31477#discussion_r570672502 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala ## @@ -89,13 +90,20 @@ case class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31463: [PYTHON][MINOR] Fix docstring of join

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #31463: URL: https://github.com/apache/spark/pull/31463#issuecomment-772912593 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31462: [SPARK-34347][SQL] CatalogImpl.uncacheTable should invalidate in cascade for temp views

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #31462: URL: https://github.com/apache/spark/pull/31462#issuecomment-773111769 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] xinrong-databricks commented on pull request #31463: [PYTHON][MINOR] Fix docstring of join

2021-02-05 Thread GitBox

xinrong-databricks commented on pull request #31463: URL: https://github.com/apache/spark/pull/31463#issuecomment-773712671 Thank you for your reviews! @srowen I don't see other instances so far. I've marked the PR as non-draft.

[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2021-02-05 Thread GitBox

SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-773150220 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] SparkQA commented on pull request #31462: [SPARK-34347][SQL] CatalogImpl.uncacheTable should invalidate in cascade for temp views

2021-02-05 Thread GitBox

SparkQA commented on pull request #31462: URL: https://github.com/apache/spark/pull/31462#issuecomment-773065658 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng closed pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-05 Thread GitBox

zhengruifeng closed pull request #31469: URL: https://github.com/apache/spark/pull/31469 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] maropu commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2021-02-05 Thread GitBox

maropu commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-773786201 The current approach itself looks fine. This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2021-02-05 Thread GitBox

AngersZh commented on a change in pull request #30957: URL: https://github.com/apache/spark/pull/30957#discussion_r570797337 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -220,6 +226,9 @@ trait

[GitHub] [spark] SparkQA removed a comment on pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31466: URL: https://github.com/apache/spark/pull/31466#issuecomment-773773261 **[Test build #134907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134907/testReport)** for PR 31466 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29185: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #29185: URL: https://github.com/apache/spark/pull/29185#issuecomment-773222385 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31483: [SPARK-33434][PYTHON][DOCS] Added RuntimeConfig to PySpark docs

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #31483: URL: https://github.com/apache/spark/pull/31483#issuecomment-773839682 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] yaooqinn commented on pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause perf reg

2021-02-05 Thread GitBox

yaooqinn commented on pull request #31460: URL: https://github.com/apache/spark/pull/31460#issuecomment-773740286 OK, it's my pleasure This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] maropu commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-05 Thread GitBox

maropu commented on a change in pull request #31413: URL: https://github.com/apache/spark/pull/31413#discussion_r570645480 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -591,20 +590,41 @@ case class FileSourceScanExec(

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #31467: URL: https://github.com/apache/spark/pull/31467#issuecomment-773111772 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA removed a comment on pull request #29210: [SPARK-24497][SQL] Support recursive SQL query

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #29210: URL: https://github.com/apache/spark/pull/29210#issuecomment-773469611 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] viirya commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-02-05 Thread GitBox

viirya commented on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-773564872 cc @rdblue @Ngone51 @cloud-fan @sunchao @dongjoon-hyun this is separated from #31451 and only includes interface changes.

[GitHub] [spark] cloud-fan commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-05 Thread GitBox

cloud-fan commented on a change in pull request #31466: URL: https://github.com/apache/spark/pull/31466#discussion_r570210410 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -260,9 +260,6 @@ class SQLQueryTestSuite extends QueryTest

[GitHub] [spark] wangyum commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning

2021-02-05 Thread GitBox

wangyum commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-773886254 @JkSelf @cloud-fan This implementation can not reuse `BroadcastExchange` if BHJ after SMJ. For example: ```SQL SELECT count(*) FROM (SELECT c.c_customer_sk,

[GitHub] [spark] WamBamBoozle edited a comment on pull request #31162: [SPARK-34033][R] SparkR Daemon Initialization

2021-02-05 Thread GitBox

WamBamBoozle edited a comment on pull request #31162: URL: https://github.com/apache/spark/pull/31162#issuecomment-772732386 @srowen, you write > I dont' know much about R - why does this help improve performance? It is like moving the invariant expression out of the loop. It

[GitHub] [spark] SparkQA commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2021-02-05 Thread GitBox

SparkQA commented on pull request #28885: URL: https://github.com/apache/spark/pull/28885#issuecomment-773455991 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] SparkQA removed a comment on pull request #31472: [SPARK-34356][ML] OVR transform fix potential column conflict

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31472: URL: https://github.com/apache/spark/pull/31472#issuecomment-773239575 **[Test build #134869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134869/testReport)** for PR 31472 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31480: URL: https://github.com/apache/spark/pull/31480#issuecomment-773813064 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon closed pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-05 Thread GitBox

HyukjinKwon closed pull request #31464: URL: https://github.com/apache/spark/pull/31464 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on pull request #31472: [SPARK-34356][ML] OVR transform fix potential column conflict

2021-02-05 Thread GitBox

zhengruifeng commented on pull request #31472: URL: https://github.com/apache/spark/pull/31472#issuecomment-773203630 ``` scala> val df = spark.read.format("libsvm").load("/d0/Dev/Opensource/spark/data/mllib/sample_multiclass_classification_data.txt").withColumn("probability",

[GitHub] [spark] viirya commented on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-05 Thread GitBox

viirya commented on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773508769 `org.apache.spark.sql.CachedTableSuite.SPARK-34269: cache lookup with ORDER BY / LIMIT clause` failed more than one time. But seems it passed without this change?

[GitHub] [spark] AngersZhuuuu commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-02-05 Thread GitBox

AngersZh commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-773142558 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29210: [SPARK-24497][SQL] Support recursive SQL query

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #29210: URL: https://github.com/apache/spark/pull/29210#issuecomment-773499544 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-773255967 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] maropu commented on pull request #31455: [SPARK-34342][SQL] Format DateLiteral and TimestampLiteral toString

2021-02-05 Thread GitBox

maropu commented on pull request #31455: URL: https://github.com/apache/spark/pull/31455#issuecomment-773686640 LGTM except for the @MaxGekk comment. This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] SparkQA commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join

2021-02-05 Thread GitBox

SparkQA commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-773155684 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] SparkQA commented on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-05 Thread GitBox

SparkQA commented on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773097354 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] AmplabJenkins commented on pull request #31478: [SPARK-34371][SQL][TESTS] Run the datetime rebasing tests for Parquet datasource v1 and v2

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31478: URL: https://github.com/apache/spark/pull/31478#issuecomment-773684374 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-773146054 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-773579295 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #31486: [SPARK-34359][SQL][3.1] Add a legacy config to restore the output schema of SHOW DATABASES

2021-02-05 Thread GitBox

SparkQA commented on pull request #31486: URL: https://github.com/apache/spark/pull/31486#issuecomment-773877044 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on a change in pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning

2021-02-05 Thread GitBox

cloud-fan commented on a change in pull request #31258: URL: https://github.com/apache/spark/pull/31258#discussion_r570002134 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -156,7 +157,7 @@ case class

[GitHub] [spark] AmplabJenkins commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #28885: URL: https://github.com/apache/spark/pull/28885#issuecomment-773499553 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31473: [SPARK-34357][SQL] Map JDBC SQL TIME type to TimestampType with time portion fixed regardless of timezone

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #31473: URL: https://github.com/apache/spark/pull/31473#issuecomment-773256704 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] srowen commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API

2021-02-05 Thread GitBox

srowen commented on pull request #31461: URL: https://github.com/apache/spark/pull/31461#issuecomment-773360047 It does sound like ideally the API would be refactored. @viirya I didn't see pushback on your redesign. We can just open this up, which at least re-enables the current

[GitHub] [spark] SparkQA commented on pull request #31473: [SPARK-34357][SQL] Map JDBC SQL TIME type to TimestampType with time portion fixed regardless of timezone

2021-02-05 Thread GitBox

SparkQA commented on pull request #31473: URL: https://github.com/apache/spark/pull/31473#issuecomment-773306832 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] SparkQA commented on pull request #31477: [SPARK-34369][SQL][WEBUI] Track number of pairs processed out of Join.

2021-02-05 Thread GitBox

SparkQA commented on pull request #31477: URL: https://github.com/apache/spark/pull/31477#issuecomment-773692087 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] saikocat commented on a change in pull request #30902: [SPARK-33888][SQL] JDBC SQL TIME type represents incorrectly as TimestampType, it should be physical Int in millis

2021-02-05 Thread GitBox

saikocat commented on a change in pull request #30902: URL: https://github.com/apache/spark/pull/30902#discussion_r569983009 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -408,6 +421,23 @@ object JdbcUtils extends

[GitHub] [spark] rahij commented on pull request #29625: [SPARK-24528][SQL] Add support to read multiple sorted bucket files for data source v1

2021-02-05 Thread GitBox

rahij commented on pull request #29625: URL: https://github.com/apache/spark/pull/29625#issuecomment-773412026 @c21 I wanted to ask if you were planning on continuing this PR now that https://github.com/apache/spark/pull/29804/files has been merged?

[GitHub] [spark] SparkQA removed a comment on pull request #29185: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #29185: URL: https://github.com/apache/spark/pull/29185#issuecomment-773152387 **[Test build #134866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134866/testReport)** for PR 29185 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #31473: [SPARK-34357] Revert JDBC SQL TIME type to TimestampType with time portion fixed regardless of timezone

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31473: URL: https://github.com/apache/spark/pull/31473#issuecomment-773256704 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773155810 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA removed a comment on pull request #31456: [SPARK-34343][SQL][TESTS] Add missing test for some non-array types in PostgreSQL

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31456: URL: https://github.com/apache/spark/pull/31456#issuecomment-773580714 **[Test build #134887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134887/testReport)** for PR 31456 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause per

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31460: URL: https://github.com/apache/spark/pull/31460#issuecomment-773087469 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #31463: [PYTHON][MINOR] Fix docstring of DataFrame.join

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31463: URL: https://github.com/apache/spark/pull/31463#issuecomment-773752115 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AngersZhuuuu commented on pull request #31378: [SPARK-34240][SQL] Unify output of `SHOW TBLPROPERTIES` clause's output attribute's schema and ExprID

2021-02-05 Thread GitBox

AngersZh commented on pull request #31378: URL: https://github.com/apache/spark/pull/31378#issuecomment-773380494 ping @cloud-fan Any more update on this? This is an automated message from the Apache Git Service. To

[GitHub] [spark] JkSelf commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning

2021-02-05 Thread GitBox

JkSelf commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-773889572 @wangyum Yes. This implementation only is the first PR to support the join is bhj before apply AQE rules. We will support the join is smj and then convert to bhj use case in the

[GitHub] [spark] AngersZhuuuu opened a new pull request #31485: [SPARK-SQL][34137] Update suquery's stats when build LogicalPlan's stats

2021-02-05 Thread GitBox

AngersZh opened a new pull request #31485: URL: https://github.com/apache/spark/pull/31485 ### What changes were proposed in this pull request? When explain SQL with cost, treeString about subquery won't show it's statistics: How to reproduce: ``` spark.sql("create

[GitHub] [spark] AmplabJenkins commented on pull request #31384: [SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31384: URL: https://github.com/apache/spark/pull/31384#issuecomment-773460643 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-05 Thread GitBox

HyukjinKwon commented on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773698884 Merged to master. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HeartSaVioR commented on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-05 Thread GitBox

HeartSaVioR commented on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-773858199 OK. I'll leave this till early next week and merge if there's no further comment. This is an automated

[GitHub] [spark] cloud-fan closed pull request #31440: [SPARK-34331][SQL] Speed up DS v2 metadata col resolution

2021-02-05 Thread GitBox

cloud-fan closed pull request #31440: URL: https://github.com/apache/spark/pull/31440 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31479: URL: https://github.com/apache/spark/pull/31479#issuecomment-773790360 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] sririshindra commented on pull request #31477: [SPARK-34369][SQL][WEBUI] Track number of pairs processed out of Join.

2021-02-05 Thread GitBox

sririshindra commented on pull request #31477: URL: https://github.com/apache/spark/pull/31477#issuecomment-773638159 Could you please take a look at this PR. cc: @maropu @dongjoon-hyun This is an automated message from the

[GitHub] [spark] SparkQA removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-773900126 **[Test build #134927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134927/testReport)** for PR 30869 at commit

[GitHub] [spark] HyukjinKwon commented on a change in pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-05 Thread GitBox

HyukjinKwon commented on a change in pull request #31471: URL: https://github.com/apache/spark/pull/31471#discussion_r570179196 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -217,8 +217,11 @@ object

[GitHub] [spark] cloud-fan opened a new pull request #31486: [SPARK-34359][SQL][3.1] Add a legacy config to restore the output schema of SHOW DATABASES

2021-02-05 Thread GitBox

cloud-fan opened a new pull request #31486: URL: https://github.com/apache/spark/pull/31486 This backports https://github.com/apache/spark/pull/31474 to 3.1/3.0 ### What changes were proposed in this pull request? This is a followup of

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2021-02-05 Thread GitBox

AngersZh commented on a change in pull request #30957: URL: https://github.com/apache/spark/pull/30957#discussion_r570723751 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala ## @@ -174,6 +174,7 @@ object

[GitHub] [spark] SparkQA removed a comment on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-773773238 **[Test build #134906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134906/testReport)** for PR 31471 at commit

[GitHub] [spark] SparkQA commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-05 Thread GitBox

SparkQA commented on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-773902011 **[Test build #134929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134929/testReport)** for PR 31487 at commit

[GitHub] [spark] SparkQA commented on pull request #31482: [SPARK-34346][CORE][SQL][3.1] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause perf

2021-02-05 Thread GitBox

SparkQA commented on pull request #31482: URL: https://github.com/apache/spark/pull/31482#issuecomment-773861703 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39500/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-773901509 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134927/

[GitHub] [spark] cloud-fan commented on pull request #31486: [SPARK-34359][SQL][3.1] Add a legacy config to restore the output schema of SHOW DATABASES

2021-02-05 Thread GitBox

cloud-fan commented on pull request #31486: URL: https://github.com/apache/spark/pull/31486#issuecomment-773870656 @HyukjinKwon @dongjoon-hyun @maropu This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] maropu commented on a change in pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2021-02-05 Thread GitBox

maropu commented on a change in pull request #30957: URL: https://github.com/apache/spark/pull/30957#discussion_r570776893 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala ## @@ -471,6 +473,126 @@ abstract class

[GitHub] [spark] LuciferYang opened a new pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-05 Thread GitBox

LuciferYang opened a new pull request #31487: URL: https://github.com/apache/spark/pull/31487 ### What changes were proposed in this pull request? `Mockito.initMocks(Object)` is a deprecated api, should use `Mockito.openMocks(Object)` instead. ### Why are the changes needed?

[GitHub] [spark] SparkQA removed a comment on pull request #31479: [SPARK-34373][SQL] HiveThriftServer2 startWithContext may hang with a race issue

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #31479: URL: https://github.com/apache/spark/pull/31479#issuecomment-773776064 **[Test build #134904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134904/testReport)** for PR 31479 at commit

[GitHub] [spark] cloud-fan commented on a change in pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly

2021-02-05 Thread GitBox

cloud-fan commented on a change in pull request #31245: URL: https://github.com/apache/spark/pull/31245#discussion_r570789600 ## File path: docs/sql-migration-guide.md ## @@ -40,6 +40,10 @@ license: | - In Spark 3.2, script transform default FIELD DELIMIT is `\u0001` for no

[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-02-05 Thread GitBox

SparkQA commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-773907871 **[Test build #134928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134928/testReport)** for PR 29087 at commit

[GitHub] [spark] SparkQA commented on pull request #30650: [SPARK-24818][CORE] Support delay scheduling for barrier execution

2021-02-05 Thread GitBox

SparkQA commented on pull request #30650: URL: https://github.com/apache/spark/pull/30650#issuecomment-773876021 **[Test build #134923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134923/testReport)** for PR 30650 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-02-05 Thread GitBox

AmplabJenkins removed a comment on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-773632690 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan closed pull request #31478: [SPARK-34371][SQL][TESTS] Run the datetime rebasing tests for Parquet datasource v1 and v2

2021-02-05 Thread GitBox

cloud-fan closed pull request #31478: URL: https://github.com/apache/spark/pull/31478 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #31483: [SPARK-33434][PYTHON][DOCS] Added RuntimeConfig to PySpark docs

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #31483: URL: https://github.com/apache/spark/pull/31483#issuecomment-773867181 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134920/

[GitHub] [spark] HeartSaVioR edited a comment on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-05 Thread GitBox

HeartSaVioR edited a comment on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-773744959 Actually I have been thinking about this - while I think this helps to track down the elapsed time on committing job, there's still another problem end users confuse

[GitHub] [spark] xkrogen commented on a change in pull request #31133: [SPARK-26836][SQL] Supporting Avro schema evolution for partitioned Hive tables

2021-02-05 Thread GitBox

xkrogen commented on a change in pull request #31133: URL: https://github.com/apache/spark/pull/31133#discussion_r570355910 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ## @@ -388,6 +394,9 @@ private[hive] object HiveTableUtil {

[GitHub] [spark] wzhfy edited a comment on pull request #30965: [SPARK-33935][SQL] Fix CBO cost function

2021-02-05 Thread GitBox

wzhfy edited a comment on pull request #30965: URL: https://github.com/apache/spark/pull/30965#issuecomment-773864841 @tanelk Hi, sorry to see this so late. IIRC the reason to use a relative value for rowCount and size, is to normalize them to a similar scale while comparing cost.

[GitHub] [spark] SparkQA commented on pull request #31483: [SPARK-33434][PYTHON][DOCS] Added RuntimeConfig to PySpark docs

2021-02-05 Thread GitBox

SparkQA commented on pull request #31483: URL: https://github.com/apache/spark/pull/31483#issuecomment-773869613 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39503/

[GitHub] [spark] wzhfy edited a comment on pull request #30965: [SPARK-33935][SQL] Fix CBO cost function

2021-02-05 Thread GitBox

wzhfy edited a comment on pull request #30965: URL: https://github.com/apache/spark/pull/30965#issuecomment-773864841 @tanelk Hi, sorry to see this so late. IIRC the reason to use a relative value for rowCount and size, is to normalize them to a similar scale while comparing cost.

[GitHub] [spark] SparkQA commented on pull request #31245: [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly

2021-02-05 Thread GitBox

SparkQA commented on pull request #31245: URL: https://github.com/apache/spark/pull/31245#issuecomment-773869378 **[Test build #134922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134922/testReport)** for PR 31245 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-02-05 Thread GitBox

SparkQA removed a comment on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-773900126 **[Test build #134927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134927/testReport)** for PR 30869 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-02-05 Thread GitBox

AmplabJenkins commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-773901509 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134927/

[GitHub] [spark] SparkQA commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-02-05 Thread GitBox

SparkQA commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-773901491 **[Test build #134927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134927/testReport)** for PR 30869 at commit

< 1 2 3 4 5 6 7 8 >

501 - 600 of 775 matches

Mail list logo