[GitHub] [spark] dbtsai commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
dbtsai commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902475893 > Everything depends on the data lifecycle. For the safety, we can control it by reducing `spark.sql.fileMetaCache.ttlSinceLastAccessSec` to `10 secs` or less which is still eff

[GitHub] [spark] Ngone51 closed pull request #33782: [SPARK-35011][CORE][3.0] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-08-19 Thread GitBox
Ngone51 closed pull request #33782: URL: https://github.com/apache/spark/pull/33782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubs

[GitHub] [spark] Ngone51 commented on pull request #33782: [SPARK-35011][CORE][3.0] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-08-19 Thread GitBox
Ngone51 commented on pull request #33782: URL: https://github.com/apache/spark/pull/33782#issuecomment-902470428 GA passed. Merged to branch-3.0, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] SparkQA commented on pull request #33795: [SPARK-36532][CORE][3.1] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to avoid executor shutdown h

2021-08-19 Thread GitBox
SparkQA commented on pull request #33795: URL: https://github.com/apache/spark/pull/33795#issuecomment-902469200 **[Test build #142665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142665/testReport)** for PR 33795 at commit [`3ccc0f8`](https://github.com

[GitHub] [spark] Ngone51 commented on pull request #33795: [SPARK-36532][CORE][3.1] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to avoid executor shutdown h

2021-08-19 Thread GitBox
Ngone51 commented on pull request #33795: URL: https://github.com/apache/spark/pull/33795#issuecomment-902468291 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] Ngone51 opened a new pull request #33795: [SPARK-36532][CORE][3.1] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to avoid executor shutdown h

2021-08-19 Thread GitBox
Ngone51 opened a new pull request #33795: URL: https://github.com/apache/spark/pull/33795 ### What changes were proposed in this pull request? Instead of exiting the executor within the RpcEnv's thread, exit the executor in a separate thread. ### Why are the changes needed?

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902450754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902466437 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47164/ -- T

[GitHub] [spark] itholic commented on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
itholic commented on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902452005 LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902452047 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47164/ -- This is an automated message from the A

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902450734 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47163/ -- This is an automated message from the A

[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902450754 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47163/ -- T

[GitHub] [spark] yoda-mon commented on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
yoda-mon commented on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902450179 Updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
LuciferYang commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902447683 > Yea, but it adds complexity and more memory consumption like you mentioned earlier, and you'll need to have the driver a long running process like a Presto coordinator

[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
LuciferYang commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902447020 > which I'm not sure how many people are using Spark this way. There should be many. We can do some survey, haha ~ -- This is an automated message from the Apach

[GitHub] [spark] sunchao commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
sunchao commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-90200 Yea, but it adds complexity and more memory consumption like you mentioned earlier, and you'll need to have the driver a long running process like a Presto coordinator, which I'

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902443736 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47165/

[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902443736 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47165/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33790: Updates AuthEngine to pass the correct SecretKeySpec format

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #33790: URL: https://github.com/apache/spark/pull/33790#issuecomment-902443745 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] itholic edited a comment on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
itholic edited a comment on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902442698 Nice!! Could you also update the screen captures to the PR description ? -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [spark] LuciferYang edited a comment on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
LuciferYang edited a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902439693 > I understand you want to avoid the duplicate footer lookup. In Parquet at least we can just pass the footer from either ParquetFileFormat or ParquetPartitionReaderF

[GitHub] [spark] itholic commented on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
itholic commented on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902442698 > @gengliangwang @HyukjinKwon > Thank you for your advice, I put screen shots of around the images bellow. > > Environment > > * Windows 10 > * Google Chrome 9

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902442032 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47165/ -- This

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902439939 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47164/ -- This is an automated message from the Apache

[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
LuciferYang commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902439693 > I understand you want to avoid the duplicate footer lookup. In Parquet at least we can just pass the footer from either ParquetFileFormat or ParquetPartitionReaderFactory

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902438633 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47163/ -- This is an automated message from the Apache

[GitHub] [spark] sunchao commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
sunchao commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902435389 > Can we add ctime or mtime of the file to the PartitionedFile and use this information for check? Yea file path + modification time seem like a good way to validate the c

[GitHub] [spark] ulysses-you commented on a change in pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
ulysses-you commented on a change in pull request #32816: URL: https://github.com/apache/spark/pull/32816#discussion_r692656713 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -100,24 +100,34 @@ case class Adaptiv

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902426885 **[Test build #142664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142664/testReport)** for PR 32816 at commit [`8058fe9`](https://github.com

[GitHub] [spark] xuanyuanking commented on pull request #33763: [SPARK-36533][SS] Trigger.AvailableNow for running streaming queries like Trigger.Once in multiple batches

2021-08-19 Thread GitBox
xuanyuanking commented on pull request #33763: URL: https://github.com/apache/spark/pull/33763#issuecomment-902426850 cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902425654 **[Test build #142663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142663/testReport)** for PR 32816 at commit [`f5ad40e`](https://github.com

[GitHub] [spark] yoda-mon commented on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
yoda-mon commented on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902425495 @gengliangwang @HyukjinKwon Thank you for your advice, I put screen shots of around the images bellow. Environment - Windows 10 - Google Chrome 92.0.4515.159

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-902424243 **[Test build #142662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142662/testReport)** for PR 32816 at commit [`b54e9c2`](https://github.com

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33673: [SPARK-36448][SQL] Exceptions in NoSuchItemException.scala have to be case classes

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33673: URL: https://github.com/apache/spark/pull/33673#issuecomment-902423683 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142657/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902423685 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47161/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902423684 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47162/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33644: URL: https://github.com/apache/spark/pull/33644#issuecomment-902423682 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142651/ -

[GitHub] [spark] AmplabJenkins commented on pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #33644: URL: https://github.com/apache/spark/pull/33644#issuecomment-902423682 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142651/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33673: [SPARK-36448][SQL] Exceptions in NoSuchItemException.scala have to be case classes

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #33673: URL: https://github.com/apache/spark/pull/33673#issuecomment-902423683 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142657/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902423684 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47162/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902423685 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47161/ -- T

[GitHub] [spark] ulysses-you commented on a change in pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-19 Thread GitBox
ulysses-you commented on a change in pull request #32816: URL: https://github.com/apache/spark/pull/32816#discussion_r692653369 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -656,13 +687,54 @@ case class Adaptiv

[GitHub] [spark] SparkQA removed a comment on pull request #33673: [SPARK-36448][SQL] Exceptions in NoSuchItemException.scala have to be case classes

2021-08-19 Thread GitBox
SparkQA removed a comment on pull request #33673: URL: https://github.com/apache/spark/pull/33673#issuecomment-902320624 **[Test build #142657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142657/testReport)** for PR 33673 at commit [`c089a25`](https://gi

[GitHub] [spark] SparkQA removed a comment on pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

2021-08-19 Thread GitBox
SparkQA removed a comment on pull request #33644: URL: https://github.com/apache/spark/pull/33644#issuecomment-902186886 **[Test build #142651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142651/testReport)** for PR 33644 at commit [`f728a02`](https://gi

[GitHub] [spark] SparkQA commented on pull request #33673: [SPARK-36448][SQL] Exceptions in NoSuchItemException.scala have to be case classes

2021-08-19 Thread GitBox
SparkQA commented on pull request #33673: URL: https://github.com/apache/spark/pull/33673#issuecomment-902418529 **[Test build #142657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142657/testReport)** for PR 33673 at commit [`c089a25`](https://github.co

[GitHub] [spark] AngersZhuuuu commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
AngersZh commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902415615 > @AngersZh the change seems fine but let's make sure having a detailed PR description e.g) with an example of requesting and output, how you tested, etc. How ab

[GitHub] [spark] SparkQA commented on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
SparkQA commented on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902413809 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47162/ -- This is an automated message from the A

[GitHub] [spark] SparkQA commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
SparkQA commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902411553 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47161/ -- This is an automated message from the A

[GitHub] [spark] ulysses-you commented on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
ulysses-you commented on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902411428 thank you all for the approved, also FYI @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] SparkQA commented on pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

2021-08-19 Thread GitBox
SparkQA commented on pull request #33644: URL: https://github.com/apache/spark/pull/33644#issuecomment-902411198 **[Test build #142651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142651/testReport)** for PR 33644 at commit [`f728a02`](https://github.co

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33791: [SPARK-36547][BUILD] Downgrade scala-maven-plugin to 4.3.0

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33791: URL: https://github.com/apache/spark/pull/33791#issuecomment-902405026 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142649/ -

[GitHub] [spark] AmplabJenkins commented on pull request #33791: [SPARK-36547][BUILD] Downgrade scala-maven-plugin to 4.3.0

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #33791: URL: https://github.com/apache/spark/pull/33791#issuecomment-902405026 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142649/ -- This

[GitHub] [spark] SparkQA commented on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
SparkQA commented on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902402284 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47162/ -- This is an automated message from the Apache

[GitHub] [spark] HyukjinKwon commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
HyukjinKwon commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902402175 @AngersZh the change seems fine but let's make sure having a detailed PR description e.g) with an example of requesting and output, how you tested, etc. -- This is an

[GitHub] [spark] SparkQA removed a comment on pull request #33791: [SPARK-36547][BUILD] Downgrade scala-maven-plugin to 4.3.0

2021-08-19 Thread GitBox
SparkQA removed a comment on pull request #33791: URL: https://github.com/apache/spark/pull/33791#issuecomment-902148745 **[Test build #142649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142649/testReport)** for PR 33791 at commit [`9ad5f55`](https://gi

[GitHub] [spark] SparkQA commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
SparkQA commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902401157 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47161/ -- This is an automated message from the Apache

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool files

2021-08-19 Thread GitBox
HyukjinKwon commented on a change in pull request #32184: URL: https://github.com/apache/spark/pull/32184#discussion_r692624659 ## File path: docs/job-scheduling.md ## @@ -252,10 +252,11 @@ properties: The pool properties can be set by creating an XML file, similar to `conf

[GitHub] [spark] SparkQA commented on pull request #33791: [SPARK-36547][BUILD] Downgrade scala-maven-plugin to 4.3.0

2021-08-19 Thread GitBox
SparkQA commented on pull request #33791: URL: https://github.com/apache/spark/pull/33791#issuecomment-902391748 **[Test build #142649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142649/testReport)** for PR 33791 at commit [`9ad5f55`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
SparkQA removed a comment on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902386507 **[Test build #142661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142661/testReport)** for PR 33794 at commit [`1497e27`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902390150 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142661/ -

[GitHub] [spark] AmplabJenkins commented on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902390150 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142661/ -- This

[GitHub] [spark] SparkQA commented on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
SparkQA commented on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902390038 **[Test build #142661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142661/testReport)** for PR 33794 at commit [`1497e27`](https://github.co

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902388917 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142660/ -

[GitHub] [spark] AmplabJenkins commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902388917 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142660/ -- This

[GitHub] [spark] SparkQA removed a comment on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
SparkQA removed a comment on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902385021 **[Test build #142660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142660/testReport)** for PR 33793 at commit [`9ce909f`](https://gi

[GitHub] [spark] SparkQA commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
SparkQA commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902388816 **[Test build #142660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142660/testReport)** for PR 33793 at commit [`9ce909f`](https://github.co

[GitHub] [spark] HyukjinKwon commented on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
HyukjinKwon commented on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902388665 cc @gengliangwang and @Ngone51 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang closed pull request #33791: [SPARK-36547][BUILD] Downgrade scala-maven-plugin to 4.3.0

2021-08-19 Thread GitBox
gengliangwang closed pull request #33791: URL: https://github.com/apache/spark/pull/33791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-

[GitHub] [spark] gengliangwang commented on pull request #33791: [SPARK-36547][BUILD] Downgrade scala-maven-plugin to 4.3.0

2021-08-19 Thread GitBox
gengliangwang commented on pull request #33791: URL: https://github.com/apache/spark/pull/33791#issuecomment-902388151 Merging to master/3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] SparkQA commented on pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
SparkQA commented on pull request #33794: URL: https://github.com/apache/spark/pull/33794#issuecomment-902386507 **[Test build #142661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142661/testReport)** for PR 33794 at commit [`1497e27`](https://github.com

[GitHub] [spark] ulysses-you opened a new pull request #33794: [SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide

2021-08-19 Thread GitBox
ulysses-you opened a new pull request #33794: URL: https://github.com/apache/spark/pull/33794 ### What changes were proposed in this pull request? * improve docs in `docs/job-scheduling.md` * add migration guide docs in `docs/core-migration-guide.md` ### Why are the

[GitHub] [spark] SparkQA commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
SparkQA commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902385021 **[Test build #142660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142660/testReport)** for PR 33793 at commit [`9ce909f`](https://github.com

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2021-08-19 Thread GitBox
AmplabJenkins removed a comment on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-902384464 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47160/

[GitHub] [spark] AmplabJenkins commented on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2021-08-19 Thread GitBox
AmplabJenkins commented on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-902384464 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47160/ -- T

[GitHub] [spark] HyukjinKwon commented on pull request #33784: Revert "[SPARK-34309][BUILD][CORE][SQL][K8S] Use Caffeine instead of Guava Cache"

2021-08-19 Thread GitBox
HyukjinKwon commented on pull request #33784: URL: https://github.com/apache/spark/pull/33784#issuecomment-902384034 I am fine with reverting if somebody feels strongly on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] LuciferYang edited a comment on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
LuciferYang edited a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902383334 > In Hive it's common that the same file name (e.g., 00_0) gets used when doing insert overwrite. Even if we check file size and other stuff, it can't completely

[GitHub] [spark] AngersZhuuuu commented on pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
AngersZh commented on pull request #33793: URL: https://github.com/apache/spark/pull/33793#issuecomment-902383422 ping @zsxwing @srowen @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
LuciferYang commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902383334 > In Hive it's common that the same file name (e.g., 00_0) gets used when doing insert overwrite. Even if we check file size and other stuff, it can't completely prevent

[GitHub] [spark] AngersZhuuuu opened a new pull request #33793: [SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc

2021-08-19 Thread GitBox
AngersZh opened a new pull request #33793: URL: https://github.com/apache/spark/pull/33793 ### What changes were proposed in this pull request? Add taskStatus supports multiple value to monitoring doc ### Why are the changes needed? Make doc clear ### Does thi

[GitHub] [spark] HyukjinKwon commented on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
HyukjinKwon commented on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902382928 Yeah, it would be great to have some screenshots in the Pr description. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] LuciferYang edited a comment on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
LuciferYang edited a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902378554 > Since the metadata is cached in the executor, does it mean the task reading the same ORC file has to be scheduled on the same executor? How can we guarantee this?

[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
LuciferYang commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902378554 > Since the metadata is cached in the executor, does it mean the task reading the same ORC file has to be scheduled on the same executor? How can we guarantee this? A

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31165: [SPARK-34092][SQL] Support Stage level restful api filter task details by task status

2021-08-19 Thread GitBox
AngersZh commented on a change in pull request #31165: URL: https://github.com/apache/spark/pull/31165#discussion_r692609348 ## File path: docs/monitoring.md ## @@ -479,11 +479,14 @@ can be identified by their `[attempt-id]`. In the API listed below, when running /app

[GitHub] [spark] SparkQA commented on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2021-08-19 Thread GitBox
SparkQA commented on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-902378474 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47160/ -- This is an automated message from the A

[GitHub] [spark] beliefer commented on pull request #33787: [SPARK-36428][TESTS][FOLLOWUP] Revert mistake change to DateExpressionsSuite

2021-08-19 Thread GitBox
beliefer commented on pull request #33787: URL: https://github.com/apache/spark/pull/33787#issuecomment-902376762 @gengliangwang Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] itholic edited a comment on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
itholic edited a comment on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902372078 > Also, should we make it `pandas-APIs-on-Spark` instead of `pandas-on-Spark`? We use "pandas APIs on Spark" as an official name, but sometimes we use "pandas-on-S

[GitHub] [spark] itholic edited a comment on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
itholic edited a comment on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902372078 We use "pandas APIs on Spark" as an official name, but sometimes we use "pandas-on-Spark" for abbreviation when the sentences look unnatural to read. For example, l

[GitHub] [spark] itholic edited a comment on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
itholic edited a comment on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902372078 We use "pandas APIs on Spark" as an official name, but sometimes we use "pandas-on-Spark" for abbreviation when the sentences look unnatural to read. For example, l

[GitHub] [spark] itholic edited a comment on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
itholic edited a comment on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902372078 We use "pandas APIs on Spark" as an official name, but sometimes we use "pandas-on-Spark" for abbreviation when the sentences look unnatural to read. For example, i

[GitHub] [spark] itholic commented on pull request #33786: [SPARK-36541][DOCS][PYTHON]Replace the word Koalas to pandas-on-Spark

2021-08-19 Thread GitBox
itholic commented on pull request #33786: URL: https://github.com/apache/spark/pull/33786#issuecomment-902372078 We use "pandas APIs on Spark" as an official name, but sometimes we use "pandas-on-Spark" for shorten name since sometimes sentences look unnatural. For example, in the ca

[GitHub] [spark] HeartSaVioR edited a comment on pull request #33784: Revert "[SPARK-34309][BUILD][CORE][SQL][K8S] Use Caffeine instead of Guava Cache"

2021-08-19 Thread GitBox
HeartSaVioR edited a comment on pull request #33784: URL: https://github.com/apache/spark/pull/33784#issuecomment-902369462 Looks like performance benefit is something not everyone seems to be agreed with, then it cannot be the rationalization of introducing the new dependency. In ot

[GitHub] [spark] HeartSaVioR edited a comment on pull request #33784: Revert "[SPARK-34309][BUILD][CORE][SQL][K8S] Use Caffeine instead of Guava Cache"

2021-08-19 Thread GitBox
HeartSaVioR edited a comment on pull request #33784: URL: https://github.com/apache/spark/pull/33784#issuecomment-902369462 Looks like performance benefit is something not everyone seems to be agreed with, then it cannot be the rationalization of introducing the new dependency. In ot

[GitHub] [spark] HeartSaVioR commented on pull request #33784: Revert "[SPARK-34309][BUILD][CORE][SQL][K8S] Use Caffeine instead of Guava Cache"

2021-08-19 Thread GitBox
HeartSaVioR commented on pull request #33784: URL: https://github.com/apache/spark/pull/33784#issuecomment-902369462 Looks like performance benefit is something not everyone seems to be agreed with, then it cannot be the rationalization of introducing the new dependency. In other per

[GitHub] [spark] ulysses-you commented on a change in pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool files

2021-08-19 Thread GitBox
ulysses-you commented on a change in pull request #32184: URL: https://github.com/apache/spark/pull/32184#discussion_r692600086 ## File path: docs/job-scheduling.md ## @@ -252,10 +252,11 @@ properties: The pool properties can be set by creating an XML file, similar to `conf

[GitHub] [spark] dongjoon-hyun commented on pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

2021-08-19 Thread GitBox
dongjoon-hyun commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-902367102 @dbtsai and @sunchao . Everything depends on the data lifecycle. For the safety, we can control it by reducing `spark.sql.fileMetaCache.ttlSinceLastAccessSec` to `10

[GitHub] [spark] ulysses-you commented on a change in pull request #32184: [SPARK-35083][CORE] Support remote scheduler pool files

2021-08-19 Thread GitBox
ulysses-you commented on a change in pull request #32184: URL: https://github.com/apache/spark/pull/32184#discussion_r692598890 ## File path: docs/job-scheduling.md ## @@ -252,10 +252,11 @@ properties: The pool properties can be set by creating an XML file, similar to `conf

[GitHub] [spark] HeartSaVioR closed pull request #33792: [SPARK-35312][SS][FOLLOW-UP] More documents and checking logic for the new options

2021-08-19 Thread GitBox
HeartSaVioR closed pull request #33792: URL: https://github.com/apache/spark/pull/33792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] HeartSaVioR commented on pull request #33792: [SPARK-35312][SS][FOLLOW-UP] More documents and checking logic for the new options

2021-08-19 Thread GitBox
HeartSaVioR commented on pull request #33792: URL: https://github.com/apache/spark/pull/33792#issuecomment-902365811 Thanks! Merging to master/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] dgd-contributor commented on a change in pull request #33752: [SPARK-36401][PYTHON] Implement Series.cov

2021-08-19 Thread GitBox
dgd-contributor commented on a change in pull request #33752: URL: https://github.com/apache/spark/pull/33752#discussion_r692597349 ## File path: python/pyspark/pandas/tests/test_ops_on_diff_frames.py ## @@ -1955,6 +1955,28 @@ def test_pow_and_rpow(self): with self.ass

[GitHub] [spark] dgd-contributor commented on a change in pull request #33752: [SPARK-36401][PYTHON] Implement Series.cov

2021-08-19 Thread GitBox
dgd-contributor commented on a change in pull request #33752: URL: https://github.com/apache/spark/pull/33752#discussion_r692597267 ## File path: python/pyspark/pandas/series.py ## @@ -944,6 +944,50 @@ def between(self, left: Any, right: Any, inclusive: bool = True) -> "Series

  1   2   3   4   5   >