[jira] [Commented] (SPARK-47836) Performance problem with QuantileSummaries

2024-04-12 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836606#comment-17836606 ] Tanel Kiis commented on SPARK-47836: I would be willing to make a PR, but I do not k

[jira] [Commented] (SPARK-47836) Performance problem with QuantileSummaries

2024-04-12 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836605#comment-17836605 ] Tanel Kiis commented on SPARK-47836: {noformat} QuantileSummaries:

[jira] [Updated] (SPARK-47836) Performance problem with QuantileSummaries

2024-04-12 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-47836: --- Description: SPARK-29336 caused a severe performance regression. In practice a partial_aggregate wit

[jira] [Created] (SPARK-46070) Precompile regex patterns in SparkDateTimeUtils.getZoneId

2023-11-23 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-46070: -- Summary: Precompile regex patterns in SparkDateTimeUtils.getZoneId Key: SPARK-46070 URL: https://issues.apache.org/jira/browse/SPARK-46070 Project: Spark Issue T

[jira] [Updated] (SPARK-40664) Union in query can remove cache from the plan

2022-10-05 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-40664: --- Description: Failing unitest: {code} test("SPARK-40664: Cache with join, union and renames") {

[jira] [Updated] (SPARK-40664) Union in query can remove cache from the plan

2022-10-05 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-40664: --- Description: Failing unitest: {code} test("SPARK-40664: Cache with join, union and renames") {

[jira] [Commented] (SPARK-40664) Union in query can remove cache from the plan

2022-10-05 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612881#comment-17612881 ] Tanel Kiis commented on SPARK-40664: I do not think that https://github.com/apache/s

[jira] [Created] (SPARK-40664) Union in query can remove cache from the plan

2022-10-05 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-40664: -- Summary: Union in query can remove cache from the plan Key: SPARK-40664 URL: https://issues.apache.org/jira/browse/SPARK-40664 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-38485) Non-deterministic UDF executed multiple times when combined with withField

2022-04-17 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523363#comment-17523363 ] Tanel Kiis commented on SPARK-38485: Is there then even any point in having non-dete

[jira] [Updated] (SPARK-38485) Non-deterministic UDF executed multiple times when combined with withField

2022-03-09 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-38485: --- Description: When adding fields to a result of a non-deterministic UDF, that returns a struct, then

[jira] [Created] (SPARK-38485) Non-deterministic UDF executed multiple times when combined with withField

2022-03-09 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-38485: -- Summary: Non-deterministic UDF executed multiple times when combined with withField Key: SPARK-38485 URL: https://issues.apache.org/jira/browse/SPARK-38485 Project: Spark

[jira] [Commented] (SPARK-38282) Avoid duplicating complex partitioning expressions

2022-02-21 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495899#comment-17495899 ] Tanel Kiis commented on SPARK-38282: [~cloud_fan], any ideas how to improve this? I

[jira] [Updated] (SPARK-38282) Avoid duplicating complex partitioning expressions

2022-02-21 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-38282: --- Description: Spark will duplicate all non-trivial expressions in Window.partitionBy, that will resu

[jira] [Updated] (SPARK-38282) Avoid duplicating complex partitioning expressions

2022-02-21 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-38282: --- Description: {code} test("SPARK-X: Avoid duplicating complex partitioning expressions") {

[jira] [Created] (SPARK-38282) Avoid duplicating complex partitioning expressions

2022-02-21 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-38282: -- Summary: Avoid duplicating complex partitioning expressions Key: SPARK-38282 URL: https://issues.apache.org/jira/browse/SPARK-38282 Project: Spark Issue Type: Im

[jira] [Created] (SPARK-37538) Replace single projection Expand with Project

2021-12-03 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-37538: -- Summary: Replace single projection Expand with Project Key: SPARK-37538 URL: https://issues.apache.org/jira/browse/SPARK-37538 Project: Spark Issue Type: Improve

[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by a sort

2021-11-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Description: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-3

[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by a sort

2021-11-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Labels: correctness (was: ) > CollectMetrics is executed twice if it is followed by a sort > --

[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by a sort

2021-11-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Summary: CollectMetrics is executed twice if it is followed by a sort (was: CollectMetrics is execu

[jira] [Commented] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort

2021-11-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450452#comment-17450452 ] Tanel Kiis commented on SPARK-37487: [~cloud_fan] and [~sarutak], you helped with th

[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort

2021-11-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Description: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X

[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort

2021-11-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Description: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X

[jira] [Updated] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort

2021-11-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-37487: --- Description: It is best examplified by this new UT in DataFrameCallbackSuite: {code} test("SPARK-X

[jira] [Created] (SPARK-37487) CollectMetrics is executed twice if it is followed by an sort

2021-11-29 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-37487: -- Summary: CollectMetrics is executed twice if it is followed by an sort Key: SPARK-37487 URL: https://issues.apache.org/jira/browse/SPARK-37487 Project: Spark Is

[jira] [Commented] (SPARK-36844) Window function "first" (unboundedFollowing) appears significantly slower than "last" (unboundedPreceding) in identical circumstances

2021-10-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436116#comment-17436116 ] Tanel Kiis commented on SPARK-36844: Hello, I also hit this issue a while back and

[jira] [Created] (SPARK-37074) Push extra predicates through non-join

2021-10-20 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-37074: -- Summary: Push extra predicates through non-join Key: SPARK-37074 URL: https://issues.apache.org/jira/browse/SPARK-37074 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-28 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421334#comment-17421334 ] Tanel Kiis commented on SPARK-36861: Yes, in 3.1 it is parsed as string. In 3.3 (mas

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420553#comment-17420553 ] Tanel Kiis commented on SPARK-36861: Sorry, indeed I ran the test on master. Nevermi

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420533#comment-17420533 ] Tanel Kiis commented on SPARK-36861: If this is expected behaviour, then I would exp

[jira] [Comment Edited] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420532#comment-17420532 ] Tanel Kiis edited comment on SPARK-36861 at 9/27/21, 7:45 AM:

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420532#comment-17420532 ] Tanel Kiis commented on SPARK-36861: [~Gengliang.Wang] I think, that this should be

[jira] [Created] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-36861: -- Summary: Partition columns are overly eagerly parsed as dates Key: SPARK-36861 URL: https://issues.apache.org/jira/browse/SPARK-36861 Project: Spark Issue Type:

[jira] [Updated] (SPARK-36496) Remove literals from grouping expressions when using the DataFrame API

2021-08-12 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-36496: --- Description: The RemoveLiteralFromGroupExpressions rule might not work, when using the DataFrame API

[jira] [Created] (SPARK-36496) Remove literals from grouping expressions when using the DataFrame API

2021-08-12 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-36496: -- Summary: Remove literals from grouping expressions when using the DataFrame API Key: SPARK-36496 URL: https://issues.apache.org/jira/browse/SPARK-36496 Project: Spark

[jira] [Created] (SPARK-35765) Distinct aggs are not duplicate sensitive

2021-06-15 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-35765: -- Summary: Distinct aggs are not duplicate sensitive Key: SPARK-35765 URL: https://issues.apache.org/jira/browse/SPARK-35765 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-35695) QueryExecutionListener does not see any observed metrics fired before persist/cache

2021-06-09 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-35695: -- Summary: QueryExecutionListener does not see any observed metrics fired before persist/cache Key: SPARK-35695 URL: https://issues.apache.org/jira/browse/SPARK-35695 Proje

[jira] [Updated] (SPARK-35695) QueryExecutionListener does not see any observed metrics fired before persist/cache

2021-06-09 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35695: --- Description: This example properly fires the event {code} spark.range(100) .observe( name = "o

[jira] [Updated] (SPARK-35695) QueryExecutionListener does not see any observed metrics fired before persist/cache

2021-06-09 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35695: --- Description: This example properly fires the event {code} spark.range(100) .observe( name = "o

[jira] [Commented] (SPARK-35695) QueryExecutionListener does not see any observed metrics fired before persist/cache

2021-06-09 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359849#comment-17359849 ] Tanel Kiis commented on SPARK-35695: [~cloud_fan] and [~sarutak], you are currently

[jira] [Updated] (SPARK-35695) QueryExecutionListener does not see any observed metrics fired before persist/cache

2021-06-09 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35695: --- Description: This example properly fires the event {code} spark.range(100) .observe( name = "o

[jira] [Updated] (SPARK-35695) QueryExecutionListener does not see any observed metrics fired before persist/cache

2021-06-09 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35695: --- Description: This example properly fires the event {code} spark.range(100) .observe( name = "o

[jira] [Updated] (SPARK-35695) QueryExecutionListener does not see any observed metrics fired before persist/cache

2021-06-09 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35695: --- Description: This example properly fires the event {code} spark.range(100) .observe( name = "o

[jira] [Created] (SPARK-35630) ExpandExec should not introduce unnecessary exchanges

2021-06-03 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-35630: -- Summary: ExpandExec should not introduce unnecessary exchanges Key: SPARK-35630 URL: https://issues.apache.org/jira/browse/SPARK-35630 Project: Spark Issue Type:

[jira] [Commented] (SPARK-35296) Dataset.observe fails with an assertion

2021-06-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356253#comment-17356253 ] Tanel Kiis commented on SPARK-35296: Perhaps someone who knows the internals better,

[jira] [Updated] (SPARK-35296) Dataset.observe fails with an assertion

2021-06-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35296: --- Affects Version/s: 3.2.0 > Dataset.observe fails with an assertion > ---

[jira] [Resolved] (SPARK-34623) Deduplicate window expressions

2021-06-02 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis resolved SPARK-34623. Resolution: Won't Do > Deduplicate window expressions > -- > >

[jira] [Resolved] (SPARK-32801) Make InferFiltersFromConstraints take in account EqualNullSafe

2021-06-02 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis resolved SPARK-32801. Resolution: Won't Do > Make InferFiltersFromConstraints take in account EqualNullSafe > --

[jira] [Commented] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338465#comment-17338465 ] Tanel Kiis commented on SPARK-35296: I finally managed to change the UT in such way,

[jira] [Updated] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35296: --- Attachment: 2021-05-03_18-34.png > Dataset.observe fails with an assertion > ---

[jira] [Updated] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35296: --- Description: I hit this assertion error when using dataset.observe: {code} java.lang.AssertionError:

[jira] [Updated] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35296: --- Description: I hit this assertion error when using dataset.observe: {code} java.lang.AssertionError:

[jira] [Comment Edited] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338438#comment-17338438 ] Tanel Kiis edited comment on SPARK-35296 at 5/3/21, 3:58 PM: -

[jira] [Commented] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338441#comment-17338441 ] Tanel Kiis commented on SPARK-35296: [~hvanhovell] The assertion in AggregatingAccum

[jira] [Commented] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338438#comment-17338438 ] Tanel Kiis commented on SPARK-35296: I tried to change an excisting UT to reproduce

[jira] [Updated] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35296: --- Description: I hit this assertion error when using dataset.observe: {code} java.lang.AssertionError:

[jira] [Updated] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35296: --- Description: I hit this assertion error when using dataset.observe: {code} java.lang.AssertionError:

[jira] [Updated] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35296: --- Description: I hit this assertion error when using dataset.observe: {code} {code} was: I hit this

[jira] [Updated] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-35296: --- Description: I hit this assertion error when using dataset.observe: {code} java.lang.AssertionError:

[jira] [Created] (SPARK-35296) Dataset.observe fails with an assertion

2021-05-03 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-35296: -- Summary: Dataset.observe fails with an assertion Key: SPARK-35296 URL: https://issues.apache.org/jira/browse/SPARK-35296 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-34794) Nested higher-order functions broken in DSL

2021-04-05 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34794: --- Labels: correctness (was: Correctness) > Nested higher-order functions broken in DSL >

[jira] [Updated] (SPARK-34794) Nested higher-order functions broken in DSL

2021-04-05 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34794: --- Affects Version/s: 3.2.0 > Nested higher-order functions broken in DSL > ---

[jira] [Updated] (SPARK-34794) Nested higher-order functions broken in DSL

2021-04-05 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34794: --- Labels: Correctness (was: ) > Nested higher-order functions broken in DSL > ---

[jira] [Created] (SPARK-34922) Use better CBO cost function

2021-03-31 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34922: -- Summary: Use better CBO cost function Key: SPARK-34922 URL: https://issues.apache.org/jira/browse/SPARK-34922 Project: Spark Issue Type: Improvement Co

[jira] [Updated] (SPARK-34882) RewriteDistinctAggregates can cause a bug if the aggregator does not ignore NULLs

2021-03-28 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34882: --- Description: {code:title=group-by.sql} SELECT first(DISTINCT a), last(DISTINCT a), first(a),

[jira] [Created] (SPARK-34882) RewriteDistinctAggregates can cause a bug if the aggregator does not ignore NULLs

2021-03-28 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34882: -- Summary: RewriteDistinctAggregates can cause a bug if the aggregator does not ignore NULLs Key: SPARK-34882 URL: https://issues.apache.org/jira/browse/SPARK-34882 Project

[jira] [Updated] (SPARK-34876) Non-nullable aggregates can return NULL in a correlated subquery

2021-03-26 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34876: --- Affects Version/s: 2.4.7 3.0.2 3.1.1 > Non-nullable ag

[jira] [Created] (SPARK-34876) Non-nullable aggregates can return NULL in a correlated subquery

2021-03-26 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34876: -- Summary: Non-nullable aggregates can return NULL in a correlated subquery Key: SPARK-34876 URL: https://issues.apache.org/jira/browse/SPARK-34876 Project: Spark

[jira] [Updated] (SPARK-34876) Non-nullable aggregates can return NULL in a correlated subquery

2021-03-26 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34876: --- Description: Test case in scalar-subquery-select.sql: {code:title=query} SELECT t1a, (SELECT c

[jira] [Created] (SPARK-34822) Update plan stability golden files even if only explain differs

2021-03-22 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34822: -- Summary: Update plan stability golden files even if only explain differs Key: SPARK-34822 URL: https://issues.apache.org/jira/browse/SPARK-34822 Project: Spark

[jira] [Created] (SPARK-34812) RowNumberLike and RankLike should not be nullable

2021-03-21 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34812: -- Summary: RowNumberLike and RankLike should not be nullable Key: SPARK-34812 URL: https://issues.apache.org/jira/browse/SPARK-34812 Project: Spark Issue Type: Imp

[jira] [Commented] (SPARK-34623) Deduplicate window expressions

2021-03-06 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296767#comment-17296767 ] Tanel Kiis commented on SPARK-34623: I had typo in the PR title, so it did not link

[jira] [Updated] (SPARK-34565) Collapse Window nodes with Project between them

2021-03-06 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34565: --- Description: The CollapseWindow optimizer rule can be improved to also collapse Window nodes, that h

[jira] [Updated] (SPARK-34623) Deduplicate window expressions

2021-03-06 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34623: --- Issue Type: Improvement (was: Bug) > Deduplicate window expressions > -

[jira] [Commented] (SPARK-34644) UDF returning array followed by explode calls the UDF multiple times and could return wrong results

2021-03-06 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296556#comment-17296556 ] Tanel Kiis commented on SPARK-34644: UDF with internal state should be marked as non

[jira] [Updated] (SPARK-34623) Deduplicate window expressions

2021-03-04 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-34623: --- Description: Remove duplicate window expressions from the Window node (was: Remove duplicate window

[jira] [Created] (SPARK-34623) Deduplicate window expressions

2021-03-04 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34623: -- Summary: Deduplicate window expressions Key: SPARK-34623 URL: https://issues.apache.org/jira/browse/SPARK-34623 Project: Spark Issue Type: Bug Componen

[jira] [Created] (SPARK-34565) Collapse Window nodes with Project between them

2021-02-27 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34565: -- Summary: Collapse Window nodes with Project between them Key: SPARK-34565 URL: https://issues.apache.org/jira/browse/SPARK-34565 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-34141) ExtractGenerator analyzer should handle lazy projectlists

2021-01-16 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34141: -- Summary: ExtractGenerator analyzer should handle lazy projectlists Key: SPARK-34141 URL: https://issues.apache.org/jira/browse/SPARK-34141 Project: Spark Issue T

[jira] [Resolved] (SPARK-34014) Ignore Distinct if it is the right child of a left semi or anti join

2021-01-06 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis resolved SPARK-34014. Resolution: Won't Fix Can cause performance regression > Ignore Distinct if it is the right child

[jira] [Created] (SPARK-34014) Ignore Distinct if it is the right child of a left semi or anti join

2021-01-05 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34014: -- Summary: Ignore Distinct if it is the right child of a left semi or anti join Key: SPARK-34014 URL: https://issues.apache.org/jira/browse/SPARK-34014 Project: Spark

[jira] [Created] (SPARK-33971) Eliminate distinct from more aggregates

2021-01-03 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-33971: -- Summary: Eliminate distinct from more aggregates Key: SPARK-33971 URL: https://issues.apache.org/jira/browse/SPARK-33971 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-33964) Combine distinct unions in more cases

2021-01-02 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-33964: --- Description: In several TPCDS queries the CombineUnions rule does not manage to combine unions, bec

[jira] [Created] (SPARK-33964) Combine distinct unions in more cases

2021-01-02 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-33964: -- Summary: Combine distinct unions in more cases Key: SPARK-33964 URL: https://issues.apache.org/jira/browse/SPARK-33964 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-33935) Fix CBOs cost function

2020-12-29 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-33935: --- Issue Type: Bug (was: Improvement) > Fix CBOs cost function > --- > >

[jira] [Created] (SPARK-33935) Fix CBOs cost function

2020-12-29 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-33935: -- Summary: Fix CBOs cost function Key: SPARK-33935 URL: https://issues.apache.org/jira/browse/SPARK-33935 Project: Spark Issue Type: Improvement Compone

[jira] [Updated] (SPARK-33070) Optimizer rules for HigherOrderFunctions

2020-12-25 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-33070: --- Summary: Optimizer rules for HigherOrderFunctions (was: Optimizer rules for collection datatypes an

[jira] [Updated] (SPARK-33070) Optimizer rules for collection datatypes and SimpleHigherOrderFunction

2020-12-25 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-33070: --- Affects Version/s: (was: 3.1.0) 3.2.0 > Optimizer rules for collection da

[jira] [Created] (SPARK-33851) Push partial aggregates bellow exchanges

2020-12-19 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-33851: -- Summary: Push partial aggregates bellow exchanges Key: SPARK-33851 URL: https://issues.apache.org/jira/browse/SPARK-33851 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-32110) -0.0 vs 0.0 is inconsistent

2020-12-10 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247357#comment-17247357 ] Tanel Kiis commented on SPARK-32110: [~cloud_fan] fixed issue I mentioned in the fir

[jira] [Created] (SPARK-33225) Extract AliasHelper trait

2020-10-22 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-33225: -- Summary: Extract AliasHelper trait Key: SPARK-33225 URL: https://issues.apache.org/jira/browse/SPARK-33225 Project: Spark Issue Type: Improvement Compo

[jira] [Created] (SPARK-33177) CollectList and CollectSet should not be nullable

2020-10-18 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-33177: -- Summary: CollectList and CollectSet should not be nullable Key: SPARK-33177 URL: https://issues.apache.org/jira/browse/SPARK-33177 Project: Spark Issue Type: Imp

[jira] [Created] (SPARK-33122) Remove redundant aggregates in the Optimzier

2020-10-12 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-33122: -- Summary: Remove redundant aggregates in the Optimzier Key: SPARK-33122 URL: https://issues.apache.org/jira/browse/SPARK-33122 Project: Spark Issue Type: Improvem

[jira] [Updated] (SPARK-33070) Optimizer rules for collection datatypes and SimpleHigherOrderFunction

2020-10-05 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-33070: --- Summary: Optimizer rules for collection datatypes and SimpleHigherOrderFunction (was: Optimizer rul

[jira] [Created] (SPARK-33070) Optimizer rules for SimpleHigherOrderFunction

2020-10-05 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-33070: -- Summary: Optimizer rules for SimpleHigherOrderFunction Key: SPARK-33070 URL: https://issues.apache.org/jira/browse/SPARK-33070 Project: Spark Issue Type: Improve

[jira] [Updated] (SPARK-33070) Optimizer rules for SimpleHigherOrderFunction

2020-10-05 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-33070: --- Priority: Minor (was: Major) > Optimizer rules for SimpleHigherOrderFunction >

[jira] [Created] (SPARK-32995) CostBasedJoinReorder optimizer rule should be idempotent

2020-09-25 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-32995: -- Summary: CostBasedJoinReorder optimizer rule should be idempotent Key: SPARK-32995 URL: https://issues.apache.org/jira/browse/SPARK-32995 Project: Spark Issue Ty

[jira] [Created] (SPARK-32970) Reduce the runtime of unit test for SPARK-32019

2020-09-22 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-32970: -- Summary: Reduce the runtime of unit test for SPARK-32019 Key: SPARK-32970 URL: https://issues.apache.org/jira/browse/SPARK-32970 Project: Spark Issue Type: Impro

[jira] [Updated] (SPARK-32928) Non-deterministic expressions should not be reordered inside AND and OR

2020-09-21 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-32928: --- Labels: correctness (was: CorrectnessBug) > Non-deterministic expressions should not be reordered i

[jira] [Updated] (SPARK-32928) Non-deterministic expressions should not be reordered inside AND and OR

2020-09-21 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-32928: --- Labels: CorrectnessBug (was: ) > Non-deterministic expressions should not be reordered inside AND a

[jira] [Commented] (SPARK-32928) Non-deterministic expressions should not be reordered inside AND and OR

2020-09-21 Thread Tanel Kiis (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199646#comment-17199646 ] Tanel Kiis commented on SPARK-32928: One more point, where this can manifest is Filt

  1   2   >