[jira] [Commented] (SPARK-22675) Refactoring PropagateTypes in TypeCoercion

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276405#comment-16276405 ] Apache Spark commented on SPARK-22675: -- User 'gatorsmile' has created a pull request

[jira] [Updated] (SPARK-22675) Refactoring PropagateTypes in TypeCoercion

2017-12-03 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-22675: Description: PropagateTypes are called twice in TypeCoercion. We do not need to call it twice. Instead, we

[jira] [Commented] (SPARK-21168) KafkaRDD should always set kafka clientId.

2017-12-03 Thread liuzhaokun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276399#comment-16276399 ] liuzhaokun commented on SPARK-21168: [~srowen] Can I create a new PR to fix this prob

[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses

2017-12-03 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276390#comment-16276390 ] Hyukjin Kwon commented on SPARK-22674: -- Not really. Regular pickle can pickle it but

[jira] [Updated] (SPARK-22679) It's slow to stop streaming context

2017-12-03 Thread wuyonghua (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyonghua updated SPARK-22679: -- Description: Attached a simple program to reproduce the issue. class QueueDStream[T: scala.reflect.Cl

[jira] [Updated] (SPARK-22660) Compile with scala-2.12 and JDK9

2017-12-03 Thread liyunzhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang updated SPARK-22660: --- Description: build with scala-2.12 with following steps 1. change the pom.xml with scala-2.12 ./dev/

[jira] [Created] (SPARK-22679) It's slow to stop streaming context

2017-12-03 Thread wuyonghua (JIRA)
wuyonghua created SPARK-22679: - Summary: It's slow to stop streaming context Key: SPARK-22679 URL: https://issues.apache.org/jira/browse/SPARK-22679 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276384#comment-16276384 ] Apache Spark commented on SPARK-20392: -- User 'viirya' has created a pull request for

[jira] [Commented] (SPARK-22660) Compile with scala-2.12 and JDK9

2017-12-03 Thread liyunzhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276383#comment-16276383 ] liyunzhang commented on SPARK-22660: when running spark sql on the above package, exc

[jira] [Updated] (SPARK-22675) Refactoring PropagateTypes in TypeCoercion

2017-12-03 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-22675: Summary: Refactoring PropagateTypes in TypeCoercion (was: Deduplicate PropagateTypes in TypeCoercion) > R

[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses

2017-12-03 Thread Jonas Amrich (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276375#comment-16276375 ] Jonas Amrich commented on SPARK-22674: -- I don't think that this cloudpickle fix woul

[jira] [Assigned] (SPARK-22274) User-defined aggregation functions with pandas udf

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22274: Assignee: Apache Spark > User-defined aggregation functions with pandas udf >

[jira] [Commented] (SPARK-22274) User-defined aggregation functions with pandas udf

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276358#comment-16276358 ] Apache Spark commented on SPARK-22274: -- User 'icexelloss' has created a pull request

[jira] [Assigned] (SPARK-22274) User-defined aggregation functions with pandas udf

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22274: Assignee: (was: Apache Spark) > User-defined aggregation functions with pandas udf > -

[jira] [Commented] (SPARK-22670) Not able to create table in HIve with SparkSession when JavaSparkContext is already initialized.

2017-12-03 Thread Naresh Meena (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276330#comment-16276330 ] Naresh Meena commented on SPARK-22670: -- In my application context is already created

[jira] [Updated] (SPARK-22670) Not able to create table in HIve with SparkSession when JavaSparkContext is already initialized.

2017-12-03 Thread Naresh Meena (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh Meena updated SPARK-22670: - Description: Not able to create table in Hive with SparkSession when SparkContext is already ini

[jira] [Commented] (SPARK-22365) Spark UI executors empty list with 500 error

2017-12-03 Thread bruce xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276300#comment-16276300 ] bruce xu commented on SPARK-22365: -- @Jakub Dubovsky I reproduce the same issue as yours

[jira] [Updated] (SPARK-22365) Spark UI executors empty list with 500 error

2017-12-03 Thread bruce xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bruce xu updated SPARK-22365: - Attachment: spark-executor-500error.png > Spark UI executors empty list with 500 error >

[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses

2017-12-03 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276287#comment-16276287 ] Hyukjin Kwon commented on SPARK-22674: -- I am aware of a fix about this in cloudpickl

[jira] [Assigned] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20392: Assignee: Apache Spark (was: Liang-Chi Hsieh) > Slow performance when calling fit on ML p

[jira] [Assigned] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20392: Assignee: Liang-Chi Hsieh (was: Apache Spark) > Slow performance when calling fit on ML p

[jira] [Created] (SPARK-22678) Schema Exception of SaveAsTable with SparkML Pipeline.transform Result DataFrame

2017-12-03 Thread geosmart (JIRA)
geosmart created SPARK-22678: Summary: Schema Exception of SaveAsTable with SparkML Pipeline.transform Result DataFrame Key: SPARK-22678 URL: https://issues.apache.org/jira/browse/SPARK-22678 Project: Sp

[jira] [Commented] (SPARK-20728) Make ORCFileFormat configurable between sql/hive and sql/core

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276236#comment-16276236 ] Apache Spark commented on SPARK-20728: -- User 'dongjoon-hyun' has created a pull requ

[jira] [Resolved] (SPARK-22478) Spark - Truncate date by Day / Hour

2017-12-03 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-22478. -- Resolution: Duplicate > Spark - Truncate date by Day / Hour >

[jira] [Commented] (SPARK-22478) Spark - Truncate date by Day / Hour

2017-12-03 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276218#comment-16276218 ] Hyukjin Kwon commented on SPARK-22478: -- Let me resolve this as its duplicate anyway.

[jira] [Commented] (SPARK-22478) Spark - Truncate date by Day / Hour

2017-12-03 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276209#comment-16276209 ] Hyukjin Kwon commented on SPARK-22478: -- I think it's a subset of SPARK-17174. > Spa

[jira] [Assigned] (SPARK-22665) Dataset API: .repartition() inconsistency / issue

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22665: Assignee: (was: Apache Spark) > Dataset API: .repartition() inconsistency / issue > --

[jira] [Assigned] (SPARK-22665) Dataset API: .repartition() inconsistency / issue

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22665: Assignee: Apache Spark > Dataset API: .repartition() inconsistency / issue > -

[jira] [Commented] (SPARK-22665) Dataset API: .repartition() inconsistency / issue

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276140#comment-16276140 ] Apache Spark commented on SPARK-22665: -- User 'mgaido91' has created a pull request f

[jira] [Commented] (SPARK-22672) Move OrcTest to `sql/core`

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276119#comment-16276119 ] Apache Spark commented on SPARK-22672: -- User 'dongjoon-hyun' has created a pull requ

[jira] [Commented] (SPARK-21791) ORC should support column names with dot

2017-12-03 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276097#comment-16276097 ] Dongjoon Hyun commented on SPARK-21791: --- This is resolved in Spark 2.3.0. > ORC sh

[jira] [Updated] (SPARK-21791) ORC should support column names with dot

2017-12-03 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21791: -- Fix Version/s: 2.3.0 > ORC should support column names with dot > -

[jira] [Assigned] (SPARK-22677) cleanup whole stage codegen for hash aggregate

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22677: Assignee: Apache Spark (was: Wenchen Fan) > cleanup whole stage codegen for hash aggregat

[jira] [Assigned] (SPARK-22677) cleanup whole stage codegen for hash aggregate

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22677: Assignee: Wenchen Fan (was: Apache Spark) > cleanup whole stage codegen for hash aggregat

[jira] [Commented] (SPARK-22677) cleanup whole stage codegen for hash aggregate

2017-12-03 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275986#comment-16275986 ] Apache Spark commented on SPARK-22677: -- User 'cloud-fan' has created a pull request

[jira] [Created] (SPARK-22677) cleanup whole stage codegen for hash aggregate

2017-12-03 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-22677: --- Summary: cleanup whole stage codegen for hash aggregate Key: SPARK-22677 URL: https://issues.apache.org/jira/browse/SPARK-22677 Project: Spark Issue Type: Impr

[jira] [Commented] (SPARK-22478) Spark - Truncate date by Day / Hour

2017-12-03 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275985#comment-16275985 ] Sean Owen commented on SPARK-22478: --- Yeah the issue is that there are a hundred useful

[jira] [Commented] (SPARK-22478) Spark - Truncate date by Day / Hour

2017-12-03 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275979#comment-16275979 ] Marco Gaido commented on SPARK-22478: - [~Davidhod] if other people agree that we shou

[jira] [Resolved] (SPARK-22626) Wrong Hive table statistics may trigger OOM if enables CBO

2017-12-03 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22626. - Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 2.3.0 > Wrong Hive table s

[jira] [Commented] (SPARK-22478) Spark - Truncate date by Day / Hour

2017-12-03 Thread david hodeffi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275968#comment-16275968 ] david hodeffi commented on SPARK-22478: --- it should be part of spark functions packa

[jira] [Assigned] (SPARK-22669) Avoid unnecessary function calls in code generation

2017-12-03 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22669: --- Assignee: Marco Gaido > Avoid unnecessary function calls in code generation > --

[jira] [Resolved] (SPARK-22669) Avoid unnecessary function calls in code generation

2017-12-03 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22669. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19860 [https://githu

[jira] [Assigned] (SPARK-15474) ORC data source fails to write and read back empty dataframe

2017-12-03 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-15474: --- Assignee: Dongjoon Hyun > ORC data source fails to write and read back empty dataframe > --

[jira] [Resolved] (SPARK-15474) ORC data source fails to write and read back empty dataframe

2017-12-03 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-15474. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19651 [https://githu

[jira] [Assigned] (SPARK-20682) Add new ORCFileFormat based on Apache ORC

2017-12-03 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-20682: --- Assignee: Dongjoon Hyun > Add new ORCFileFormat based on Apache ORC > --

[jira] [Resolved] (SPARK-20682) Add new ORCFileFormat based on Apache ORC

2017-12-03 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20682. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19651 [https://githu

[jira] [Commented] (SPARK-22663) Spark DataSet to case class mapping mismatches

2017-12-03 Thread Sajeev Ramakrishnan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275952#comment-16275952 ] Sajeev Ramakrishnan commented on SPARK-22663: - I agree that this is a bug. Bu

[jira] [Comment Edited] (SPARK-22663) Spark DataSet to case class mapping mismatches

2017-12-03 Thread Sajeev Ramakrishnan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275952#comment-16275952 ] Sajeev Ramakrishnan edited comment on SPARK-22663 at 12/3/17 2:06 PM: -

[jira] [Comment Edited] (SPARK-20299) NullPointerException when null and string are in a tuple while encoding Dataset

2017-12-03 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274586#comment-16274586 ] Marco Gaido edited comment on SPARK-20299 at 12/3/17 9:21 AM: -