[jira] [Created] (SPARK-22605) OutputMetrics empty for DataFrame writes

2017-11-24 Thread Jason White (JIRA)
Jason White created SPARK-22605: --- Summary: OutputMetrics empty for DataFrame writes Key: SPARK-22605 URL: https://issues.apache.org/jira/browse/SPARK-22605 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-19950) nullable ignored when df.load() is executed for file-based data source

2017-03-21 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15934719#comment-15934719 ] Jason White commented on SPARK-19950: - Without something that allows us to read using

[jira] [Created] (SPARK-19561) Pyspark Dataframes don't allow timestamps near epoch

2017-02-11 Thread Jason White (JIRA)
Jason White created SPARK-19561: --- Summary: Pyspark Dataframes don't allow timestamps near epoch Key: SPARK-19561 URL: https://issues.apache.org/jira/browse/SPARK-19561 Project: Spark Issue Type

[jira] [Comment Edited] (SPARK-19299) Nulls in non nullable columns causes data corruption in parquet

2017-01-20 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832195#comment-15832195 ] Jason White edited comment on SPARK-19299 at 1/20/17 6:14 PM: -

[jira] [Comment Edited] (SPARK-19299) Nulls in non nullable columns causes data corruption in parquet

2017-01-20 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832195#comment-15832195 ] Jason White edited comment on SPARK-19299 at 1/20/17 6:09 PM: -

[jira] [Commented] (SPARK-19299) Nulls in non nullable columns causes data corruption in parquet

2017-01-20 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832195#comment-15832195 ] Jason White commented on SPARK-19299: - These seem like two or three separate issues.

[jira] [Commented] (SPARK-19299) Nulls in non nullable columns causes data corruption in parquet

2017-01-20 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832006#comment-15832006 ] Jason White commented on SPARK-19299: - Also seeing this same behaviour in Spark 2.0.1

[jira] [Commented] (SPARK-10915) Add support for UDAFs in Python

2016-10-20 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592831#comment-15592831 ] Jason White commented on SPARK-10915: - At the moment, we use .repartitionAndSortWithi

[jira] [Commented] (SPARK-10915) Add support for UDAFs in Python

2016-10-20 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592534#comment-15592534 ] Jason White commented on SPARK-10915: - That's unfortunate. Materializing a list somew

[jira] [Commented] (SPARK-10915) Add support for UDAFs in Python

2016-10-20 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591706#comment-15591706 ] Jason White commented on SPARK-10915: - We would also very much like Python UDAFs. In

[jira] [Created] (SPARK-17679) Remove unnecessary Py4J ListConverter patch

2016-09-26 Thread Jason White (JIRA)
Jason White created SPARK-17679: --- Summary: Remove unnecessary Py4J ListConverter patch Key: SPARK-17679 URL: https://issues.apache.org/jira/browse/SPARK-17679 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-14700) PySpark Row equality operator is not overridden

2016-04-18 Thread Jason White (JIRA)
Jason White created SPARK-14700: --- Summary: PySpark Row equality operator is not overridden Key: SPARK-14700 URL: https://issues.apache.org/jira/browse/SPARK-14700 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-12073) Backpressure causes individual Kafka partitions to lag

2015-12-01 Thread Jason White (JIRA)
Jason White created SPARK-12073: --- Summary: Backpressure causes individual Kafka partitions to lag Key: SPARK-12073 URL: https://issues.apache.org/jira/browse/SPARK-12073 Project: Spark Issue Ty

[jira] [Commented] (SPARK-11437) createDataFrame shouldn't .take() when provided schema

2015-10-31 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984009#comment-14984009 ] Jason White commented on SPARK-11437: - [~marmbrus] We briefly discussed this at Spark

[jira] [Created] (SPARK-11437) createDataFrame shouldn't .take() when provided schema

2015-10-31 Thread Jason White (JIRA)
Jason White created SPARK-11437: --- Summary: createDataFrame shouldn't .take() when provided schema Key: SPARK-11437 URL: https://issues.apache.org/jira/browse/SPARK-11437 Project: Spark Issue Ty

[jira] [Commented] (SPARK-8453) Unioning two RDDs in PySpark doesn't spill to disk

2015-06-18 Thread Jason White (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592541#comment-14592541 ] Jason White commented on SPARK-8453: Interestingly, if you repartition the RDDs to the

[jira] [Created] (SPARK-8453) Unioning two RDDs in PySpark doesn't spill to disk

2015-06-18 Thread Jason White (JIRA)
Jason White created SPARK-8453: -- Summary: Unioning two RDDs in PySpark doesn't spill to disk Key: SPARK-8453 URL: https://issues.apache.org/jira/browse/SPARK-8453 Project: Spark Issue Type: Bug