[jira] [Updated] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-29 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy updated SPARK-22641: - Description: We seem to have found an issue with PySpark UDFs interacting with {{withColumn}} wh

[jira] [Updated] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-29 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy updated SPARK-22641: - Description: We seem to have found an issue with PySpark UDFs interacting with {{withColumn}} wh

[jira] [Commented] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-28 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16270132#comment-16270132 ] Andrew Duffy commented on SPARK-22641: -- Query plan with the literal: {code} == Pars

[jira] [Commented] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-28 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16270128#comment-16270128 ] Andrew Duffy commented on SPARK-22641: -- So it seems this is only a problem when usin

[jira] [Comment Edited] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-28 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16270128#comment-16270128 ] Andrew Duffy edited comment on SPARK-22641 at 11/29/17 4:44 AM: ---

[jira] [Updated] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-28 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy updated SPARK-22641: - Description: We seem to have found an issue with PySpark UDFs interacting with {{withColumn}} wh

[jira] [Updated] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-28 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy updated SPARK-22641: - Description: We seem to have found an issue with PySpark UDFs interacting with {{withColumn}} wh

[jira] [Updated] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-28 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy updated SPARK-22641: - Description: We seem to have found an issue with PySpark UDFs interacting with {{withColumn}} wh

[jira] [Updated] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-28 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy updated SPARK-22641: - Description: We seem to have found an issue with PySpark UDFs interacting with {{withColumn}} wh

[jira] [Created] (SPARK-22641) Pyspark UDF relying on column added with withColumn after distinct

2017-11-28 Thread Andrew Duffy (JIRA)
Andrew Duffy created SPARK-22641: Summary: Pyspark UDF relying on column added with withColumn after distinct Key: SPARK-22641 URL: https://issues.apache.org/jira/browse/SPARK-22641 Project: Spark

[jira] [Comment Edited] (SPARK-21218) Convert IN predicate to equivalent Parquet filter

2017-06-27 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065177#comment-16065177 ] Andrew Duffy edited comment on SPARK-21218 at 6/27/17 5:39 PM:

[jira] [Commented] (SPARK-21218) Convert IN predicate to equivalent Parquet filter

2017-06-27 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065177#comment-16065177 ] Andrew Duffy commented on SPARK-21218: -- Curious, I wonder what the previous benchmar

[jira] [Commented] (SPARK-21218) Convert IN predicate to equivalent Parquet filter

2017-06-26 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063981#comment-16063981 ] Andrew Duffy commented on SPARK-21218: -- Good catch, looks like a dupe. [~hyukjin.kwo

[jira] [Resolved] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq

2017-06-26 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy resolved SPARK-17091. -- Resolution: Won't Fix Should've closed this last year, but at the time based on Hyukjin Kwon's

[jira] [Commented] (SPARK-17310) Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side

2016-08-30 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448672#comment-15448672 ] Andrew Duffy commented on SPARK-17310: -- +1 to this, see comments on https://github.c

[jira] [Updated] (SPARK-17213) Parquet String Pushdown for Non-Eq Comparisons Broken

2016-08-24 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy updated SPARK-17213: - Description: Spark defines ordering over strings based on comparison of UTF8 byte arrays, which

[jira] [Updated] (SPARK-17213) Parquet String Pushdown for Non-Eq Comparisons Broken

2016-08-24 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Duffy updated SPARK-17213: - Description: Spark defines ordering over strings based on comparison of UTF8 byte arrays, which

[jira] [Created] (SPARK-17213) Parquet String Pushdown for Non-Eq Comparisons Broken

2016-08-24 Thread Andrew Duffy (JIRA)
Andrew Duffy created SPARK-17213: Summary: Parquet String Pushdown for Non-Eq Comparisons Broken Key: SPARK-17213 URL: https://issues.apache.org/jira/browse/SPARK-17213 Project: Spark Issue T

[jira] [Created] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq

2016-08-16 Thread Andrew Duffy (JIRA)
Andrew Duffy created SPARK-17091: Summary: ParquetFilters rewrite IN to OR of Eq Key: SPARK-17091 URL: https://issues.apache.org/jira/browse/SPARK-17091 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-17059) Allow FileFormat to specify partition pruning strategy

2016-08-15 Thread Andrew Duffy (JIRA)
Andrew Duffy created SPARK-17059: Summary: Allow FileFormat to specify partition pruning strategy Key: SPARK-17059 URL: https://issues.apache.org/jira/browse/SPARK-17059 Project: Spark Issue

[jira] [Commented] (SPARK-16265) Add option to SparkSubmit to ship driver JRE to YARN

2016-07-07 Thread Andrew Duffy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365998#comment-15365998 ] Andrew Duffy commented on SPARK-16265: -- Hi Sean, yeah I can see where you're coming

[jira] [Created] (SPARK-16265) Add option to SparkSubmit to ship driver JRE to YARN

2016-06-28 Thread Andrew Duffy (JIRA)
Andrew Duffy created SPARK-16265: Summary: Add option to SparkSubmit to ship driver JRE to YARN Key: SPARK-16265 URL: https://issues.apache.org/jira/browse/SPARK-16265 Project: Spark Issue Ty