[jira] [Created] (SPARK-49731) Support Kubernetes subPathExpr and hostPath volume type options

2024-09-20 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-49731: - Summary: Support Kubernetes subPathExpr and hostPath volume type options Key: SPARK-49731 URL: https://issues.apache.org/jira/browse/SPARK-49731 Project: Spark

[jira] [Commented] (SPARK-49149) Support customized log url for Spark UI and History server in Kubernetes environment

2024-08-15 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-49149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873808#comment-17873808 ] Enrico Minack commented on SPARK-49149: --- Pre Spark 4.0.0, these two config setting

[jira] [Created] (SPARK-45708) Retry mvn deploy failures

2023-10-27 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-45708: - Summary: Retry mvn deploy failures Key: SPARK-45708 URL: https://issues.apache.org/jira/browse/SPARK-45708 Project: Spark Issue Type: Bug Compone

[jira] [Created] (SPARK-45651) Snapshots of some packages are not published any more

2023-10-24 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-45651: - Summary: Snapshots of some packages are not published any more Key: SPARK-45651 URL: https://issues.apache.org/jira/browse/SPARK-45651 Project: Spark Issue

[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-09-19 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766881#comment-17766881 ] Enrico Minack commented on SPARK-38200: --- Sadly, still no feedback from reviewers.

[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-06-15 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732945#comment-17732945 ] Enrico Minack commented on SPARK-38200: --- Created pull request for this: https://gi

[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC

2023-06-15 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732943#comment-17732943 ] Enrico Minack commented on SPARK-19335: --- Created pull request for this: https://gi

[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-06-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731503#comment-17731503 ] Enrico Minack commented on SPARK-38200: --- Related: SPARK-19335 > [SQL] Spark JDBC

[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-06-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731502#comment-17731502 ] Enrico Minack commented on SPARK-38200: --- Sadly, MERGE is shown to perform worse th

[jira] [Created] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-42716: - Summary: DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition Key: SPARK-42716 URL: https://issues.apache.org/jira/browse/SPARK-42716

[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates

2023-02-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Description: The following query should return a single row as all values for {{id}} except f

[jira] [Updated] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch

2023-02-09 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40770: -- Description: Error messages raised by `applyInPandas` and `mapInPadnas` are very generic or u

[jira] [Updated] (SPARK-34661) Replaces `OriginalType` with `LogicalTypeAnnotation` in VectorizedColumnReader

2023-02-06 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-34661: -- Affects Version/s: 3.2.3 3.3.1 3.2.2

[jira] [Updated] (SPARK-34661) Replaces `OriginalType` with `LogicalTypeAnnotation` in VectorizedColumnReader

2023-02-06 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-34661: -- Affects Version/s: (was: 3.3.0) (was: 3.2.1)

[jira] [Commented] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2023-01-31 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682878#comment-17682878 ] Enrico Minack commented on SPARK-40885: --- This has been fixed in 3.4.0 and 3.5.0:

[jira] [Updated] (SPARK-42132) DeduplicateRelations rule breaks plan when co-grouping the same DataFrame

2023-01-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-42132: -- Affects Version/s: 3.5.0 > DeduplicateRelations rule breaks plan when co-grouping the same Dat

[jira] [Updated] (SPARK-42199) groupByKey creates columns that may conflict with exising columns

2023-01-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-42199: -- Description: Calling {{ds.groupByKey(func: V => K)}} creates columns to store the key value.

[jira] [Created] (SPARK-42199) groupByKey creates columns that may conflict with exising columns

2023-01-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-42199: - Summary: groupByKey creates columns that may conflict with exising columns Key: SPARK-42199 URL: https://issues.apache.org/jira/browse/SPARK-42199 Project: Spark

[jira] [Updated] (SPARK-42168) CoGroup with window function returns incorrect result when partition keys differ in order

2023-01-24 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-42168: -- Description: The following example returns an incorrect result: {code:java} import pandas as p

[jira] [Updated] (SPARK-42168) CoGroup with window function returns incorrect result when partition keys differ in order

2023-01-24 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-42168: -- Description: The following example returns an incorrect result: {code:java} import pandas as p

[jira] [Created] (SPARK-42168) CoGroup with window function returns incorrect result when partition keys differ in order

2023-01-24 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-42168: - Summary: CoGroup with window function returns incorrect result when partition keys differ in order Key: SPARK-42168 URL: https://issues.apache.org/jira/browse/SPARK-42168

[jira] [Created] (SPARK-42132) DeduplicateRelations rule breaks plan when co-grouping the same DataFrame

2023-01-20 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-42132: - Summary: DeduplicateRelations rule breaks plan when co-grouping the same DataFrame Key: SPARK-42132 URL: https://issues.apache.org/jira/browse/SPARK-42132 Project:

[jira] [Updated] (SPARK-42132) DeduplicateRelations rule breaks plan when co-grouping the same DataFrame

2023-01-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-42132: -- Labels: correctness (was: ) > DeduplicateRelations rule breaks plan when co-grouping the same

[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2023-01-17 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40885: -- Affects Version/s: 3.4.0 > Spark will filter out data field sorting when dynamic partitions an

[jira] [Updated] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2023-01-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40819: -- Labels: regression (was: correctness) > Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Il

[jira] [Updated] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2023-01-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40819: -- Labels: correctness (was: ) > Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parq

[jira] [Updated] (SPARK-41914) Sorting issue with partitioned-writing and planned write optimization disabled

2023-01-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41914: -- Labels: correctness (was: ) > Sorting issue with partitioned-writing and planned write optimi

[jira] [Updated] (SPARK-26345) Parquet support Column indexes

2023-01-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-26345: -- Labels: (was: correctness) > Parquet support Column indexes > --

[jira] [Updated] (SPARK-26345) Parquet support Column indexes

2023-01-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-26345: -- Labels: correctness (was: ) > Parquet support Column indexes > --

[jira] [Updated] (SPARK-41959) Improve v1 writes with empty2null

2023-01-12 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41959: -- Labels: correctness (was: ) > Improve v1 writes with empty2null > ---

[jira] [Comment Edited] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2023-01-09 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656459#comment-17656459 ] Enrico Minack edited comment on SPARK-40885 at 1/10/23 7:02 AM: --

[jira] [Comment Edited] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2023-01-09 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656459#comment-17656459 ] Enrico Minack edited comment on SPARK-40885 at 1/10/23 7:00 AM: --

[jira] [Commented] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2023-01-09 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656459#comment-17656459 ] Enrico Minack commented on SPARK-40885: --- This should be fixed by SPARK-40885. > S

[jira] [Commented] (SPARK-40588) Sorting issue with partitioned-writing and AQE turned on

2023-01-05 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655135#comment-17655135 ] Enrico Minack commented on SPARK-40588: --- Unfortunately, this issue persists with S

[jira] [Created] (SPARK-41914) Sorting issue with partitioned-writing and planned write optimization disabled

2023-01-05 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-41914: - Summary: Sorting issue with partitioned-writing and planned write optimization disabled Key: SPARK-41914 URL: https://issues.apache.org/jira/browse/SPARK-41914 Proj

[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates

2023-01-04 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Affects Version/s: 3.0.3 > Anti-join must not be pushed below aggregation with ambiguous predi

[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates

2022-12-17 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Affects Version/s: 3.3.1 3.1.3 3.2.3 > Anti-join

[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates

2022-11-16 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Description: The following query should return a single row as all values for {{id}} except f

[jira] [Created] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates

2022-11-16 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-41162: - Summary: Anti-join must not be pushed below aggregation with ambiguous predicates Key: SPARK-41162 URL: https://issues.apache.org/jira/browse/SPARK-41162 Project: S

[jira] [Updated] (SPARK-41014) Improve documentation and typing of applyInPandas for groupby and cogroup

2022-11-04 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41014: -- Description: Documentation of method `applyInPandas` for grouby and cogroup does not mention

[jira] [Created] (SPARK-41014) Improve documentation and typing of applyInPandas for groupby and cogroup

2022-11-04 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-41014: - Summary: Improve documentation and typing of applyInPandas for groupby and cogroup Key: SPARK-41014 URL: https://issues.apache.org/jira/browse/SPARK-41014 Project:

[jira] [Commented] (SPARK-40559) Add applyInArrow to pyspark.sql.GroupedData

2022-11-01 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627483#comment-17627483 ] Enrico Minack commented on SPARK-40559: --- That would require users to re-implement

[jira] [Comment Edited] (SPARK-40588) Sorting issue with partitioned-writing and AQE turned on

2022-10-23 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/23/22 5:01 PM: -

[jira] [Comment Edited] (SPARK-40588) Sorting issue with partitioned-writing and AQE turned on

2022-10-23 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/23/22 4:55 PM: -

[jira] [Comment Edited] (SPARK-40588) Sorting issue with partitioned-writing and AQE turned on

2022-10-23 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/23/22 4:55 PM: -

[jira] [Updated] (SPARK-40588) Sorting issue with partitioned-writing and AQE turned on

2022-10-23 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40588: -- Summary: Sorting issue with partitioned-writing and AQE turned on (was: Sorting issue with AQ

[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on

2022-10-22 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622620#comment-17622620 ] Enrico Minack commented on SPARK-40588: --- Even with AQE enabled (pre Spark 3.4.0),

[jira] [Comment Edited] (SPARK-40588) Sorting issue with AQE turned on

2022-10-22 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/22/22 1:02 PM: -

[jira] [Comment Edited] (SPARK-40588) Sorting issue with AQE turned on

2022-10-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/20/22 4:23 PM: -

[jira] [Comment Edited] (SPARK-40588) Sorting issue with AQE turned on

2022-10-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/20/22 4:23 PM: -

[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on

2022-10-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032 ] Enrico Minack commented on SPARK-40588: --- Here is a more concise and complete examp

[jira] [Updated] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2022-10-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40819: -- Affects Version/s: 3.4.0 3.3.1 3.2.3

[jira] [Updated] (SPARK-38591) Add sortWithinGroups to KeyValueGroupedDataset

2022-10-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38591: -- Summary: Add sortWithinGroups to KeyValueGroupedDataset (was: Add flatMapSortedGroups and co

[jira] [Updated] (SPARK-40830) Dataset.groupBy.as should be preferred over Dataset.groupByKey

2022-10-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40830: -- Description: Calling {{Dataset.groupBy(...).as[K, T]}} should be preferred over calling {{Dat

[jira] [Updated] (SPARK-40830) Dataset.groupBy.as should be preferred over Dataset.groupByKey

2022-10-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40830: -- Priority: Minor (was: Trivial) > Dataset.groupBy.as should be preferred over Dataset.groupByK

[jira] [Created] (SPARK-40830) Dataset.groupBy.as should be preferred over Dataset.groupByKey

2022-10-18 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-40830: - Summary: Dataset.groupBy.as should be preferred over Dataset.groupByKey Key: SPARK-40830 URL: https://issues.apache.org/jira/browse/SPARK-40830 Project: Spark

[jira] [Updated] (SPARK-38591) Add flatMapSortedGroups and cogroupSorted to KeyValueGroupedDataset

2022-10-17 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38591: -- Description: The existing methods {{KeyValueGroupedDataset.flatMapGroups}} and {{KeyValueGrou

[jira] [Updated] (SPARK-39783) Column backticks are misplaced in the AnalysisException [UNRESOLVED_COLUMN] error message when using field with "."

2022-10-14 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39783: -- Description: AnalysisException {{[UNRESOLVED_COLUMN.WITH_SUGGESTION]}} shows the wrong sugges

[jira] [Updated] (SPARK-39783) Column backticks are misplaced in the AnalysisException [UNRESOLVED_COLUMN] error message when using field with "."

2022-10-14 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39783: -- Description: AnalysisException {{[UNRESOLVED_COLUMN.WITH_SUGGESTION]}} shows the wrong sugges

[jira] [Updated] (SPARK-39783) Column backticks are misplaced in the AnalysisException [UNRESOLVED_COLUMN] error message when using field with "."

2022-10-14 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39783: -- Description: AnalysisException {{[UNRESOLVED_COLUMN.WITH_SUGGESTION]}} shows the wrong sugges

[jira] [Commented] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-10-13 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617481#comment-17617481 ] Enrico Minack commented on SPARK-39783: --- This issue is about the error message, no

[jira] [Created] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch

2022-10-12 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-40770: - Summary: Improved error messages for applyInPandas for schema mismatch Key: SPARK-40770 URL: https://issues.apache.org/jira/browse/SPARK-40770 Project: Spark

[jira] [Created] (SPARK-40601) Improve error when cogrouping groups with mismatching key sizes

2022-09-28 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-40601: - Summary: Improve error when cogrouping groups with mismatching key sizes Key: SPARK-40601 URL: https://issues.apache.org/jira/browse/SPARK-40601 Project: Spark

[jira] [Updated] (SPARK-40559) Add applyInArrow to pyspark.sql.GroupedData

2022-09-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40559: -- Description: PySpark allows to transform a {{DataFrame}} via Pandas and Arrow API: {code:pyth

[jira] [Comment Edited] (SPARK-40559) Add applyInArrow to pyspark.sql.GroupedData

2022-09-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609343#comment-17609343 ] Enrico Minack edited comment on SPARK-40559 at 9/26/22 7:30 AM: --

[jira] [Updated] (SPARK-40559) Add applyInArrow to pyspark.sql.GroupedData

2022-09-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40559: -- Description: PySpark allows to transform a {{DataFrame}} via Pandas and Arrow API: {code:pyth

[jira] [Commented] (SPARK-40559) Add applyInArrow to pyspark.sql.GroupedData

2022-09-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609343#comment-17609343 ] Enrico Minack commented on SPARK-40559: --- [~cloud_fan] [~hyukjin.kwon] [~XinrongM]

[jira] [Created] (SPARK-40559) Add applyInArrow to pyspark.sql.GroupedData

2022-09-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-40559: - Summary: Add applyInArrow to pyspark.sql.GroupedData Key: SPARK-40559 URL: https://issues.apache.org/jira/browse/SPARK-40559 Project: Spark Issue Type: New

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in P

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in P

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in P

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling `DataFrame.groupby(...).applyInPandas(...)` for very small groups in PyS

[jira] [Updated] (SPARK-38591) Add flatMapSortedGroups and cogroupSorted to KeyValueGroupedDataset

2022-08-17 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38591: -- Affects Version/s: 3.4.0 (was: 3.3.0) Description: The ex

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-01 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling `DataFrame.groupby(...).applyInPandas(...)` for very small groups in PyS

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-01 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Issue Type: Improvement (was: New Feature) > Improve performance of applyInPandas for very sm

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-01 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling `DataFrame.groupby(...).applyInPandas(...)` for very small groups in PyS

[jira] [Commented] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-01 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573655#comment-17573655 ] Enrico Minack commented on SPARK-39931: --- [~hyukjin.kwon] [~zero323] [~ruifengz] [~

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-01 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling `DataFrame.groupby(...).applyInPandas(...)` for very small groups in PyS

[jira] [Created] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-01 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39931: - Summary: Improve performance of applyInPandas for very small groups Key: SPARK-39931 URL: https://issues.apache.org/jira/browse/SPARK-39931 Project: Spark

[jira] [Created] (SPARK-39878) Migrate melt function in Pandas API to PySpark / Scala unpivot

2022-07-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39878: - Summary: Migrate melt function in Pandas API to PySpark / Scala unpivot Key: SPARK-39878 URL: https://issues.apache.org/jira/browse/SPARK-39878 Project: Spark

[jira] [Created] (SPARK-39877) Unpivot / melt function for PySpark

2022-07-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39877: - Summary: Unpivot / melt function for PySpark Key: SPARK-39877 URL: https://issues.apache.org/jira/browse/SPARK-39877 Project: Spark Issue Type: New Feature

[jira] [Created] (SPARK-39876) Unpivot / melt function for SQL

2022-07-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39876: - Summary: Unpivot / melt function for SQL Key: SPARK-39876 URL: https://issues.apache.org/jira/browse/SPARK-39876 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-38864) Unpivot / melt function for Dataset API

2022-07-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38864: -- Summary: Unpivot / melt function for Dataset API (was: Melt function for Dataset API) > Unpi

[jira] [Resolved] (SPARK-39292) Make Dataset.melt work with struct fields

2022-07-16 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack resolved SPARK-39292. --- Resolution: Fixed > Make Dataset.melt work with struct fields >

[jira] [Updated] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-07-15 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39783: -- Description: The following code references a nested value {{{}`the`.`id`{}}}, that does not e

[jira] [Commented] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-07-14 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566998#comment-17566998 ] Enrico Minack commented on SPARK-39783: --- [~srielau] [~cloud_fan]  > Wrong column

[jira] [Created] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-07-14 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39783: - Summary: Wrong column backticks in UNRESOLVED_COLUMN error Key: SPARK-39783 URL: https://issues.apache.org/jira/browse/SPARK-39783 Project: Spark Issue Typ

[jira] [Commented] (SPARK-39644) Add RangePartitioning to DataSource V2

2022-06-30 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561132#comment-17561132 ] Enrico Minack commented on SPARK-39644: --- [~csun] As discussed, I'll be working on

[jira] [Created] (SPARK-39644) Add RangePartitioning to DataSource V2

2022-06-30 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39644: - Summary: Add RangePartitioning to DataSource V2 Key: SPARK-39644 URL: https://issues.apache.org/jira/browse/SPARK-39644 Project: Spark Issue Type: New Feat

[jira] [Commented] (SPARK-39529) Refactor and merge all related job selection logic into precondition

2022-06-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556323#comment-17556323 ] Enrico Minack commented on SPARK-39529: --- A first good step could be to merge {{pre

[jira] [Commented] (SPARK-39532) Move checkout and sync steps into re-usable composite action

2022-06-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556316#comment-17556316 ] Enrico Minack commented on SPARK-39532: --- This composite action first becomes avail

[jira] [Updated] (SPARK-39532) Move checkout and sync steps into re-usable composite action

2022-06-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39532: -- Parent: SPARK-39515 Issue Type: Sub-task (was: Improvement) > Move checkout and sync

[jira] [Created] (SPARK-39532) Move checkout and sync steps into re-usable composite action

2022-06-20 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39532: - Summary: Move checkout and sync steps into re-usable composite action Key: SPARK-39532 URL: https://issues.apache.org/jira/browse/SPARK-39532 Project: Spark

[jira] [Commented] (SPARK-39515) Improve/recover scheduled jobs in GitHub Actions

2022-06-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556265#comment-17556265 ] Enrico Minack commented on SPARK-39515: --- I can give [https://github.com/apache/spa

[jira] [Commented] (SPARK-39292) Make Dataset.melt work with struct fields

2022-06-03 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17546058#comment-17546058 ] Enrico Minack commented on SPARK-39292: --- This is being fixed as part of https://is

[jira] [Updated] (SPARK-39292) Make Dataset.melt work with struct fields

2022-05-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39292: -- Description: In SPARK-38864, the melt function was added to Dataset. It would be nice if fiel

[jira] [Created] (SPARK-39292) Make Dataset.melt work with struct fields

2022-05-25 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39292: - Summary: Make Dataset.melt work with struct fields Key: SPARK-39292 URL: https://issues.apache.org/jira/browse/SPARK-39292 Project: Spark Issue Type: Impro

[jira] [Created] (SPARK-39074) Fail on uploading test files, not when downloading them

2022-04-29 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39074: - Summary: Fail on uploading test files, not when downloading them Key: SPARK-39074 URL: https://issues.apache.org/jira/browse/SPARK-39074 Project: Spark Iss

[jira] [Created] (SPARK-39038) Skip reporting test results if triggering workflow was skipped

2022-04-27 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39038: - Summary: Skip reporting test results if triggering workflow was skipped Key: SPARK-39038 URL: https://issues.apache.org/jira/browse/SPARK-39038 Project: Spark

[jira] [Updated] (SPARK-38970) Skip build-and-test workflow on forks when scheduled

2022-04-22 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38970: -- Summary: Skip build-and-test workflow on forks when scheduled (was: Check for changes only if

  1   2   >