[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351305#comment-16351305 ] Xiao Li commented on SPARK-21658: - Yes. We should revert this. It is risky. > Adds the

[jira] [Assigned] (SPARK-23327) Update the description of three external API or functions

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23327: Assignee: Xiao Li (was: Apache Spark) > Update the description of three external API or f

[jira] [Assigned] (SPARK-23327) Update the description of three external API or functions

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23327: Assignee: Apache Spark (was: Xiao Li) > Update the description of three external API or f

[jira] [Commented] (SPARK-23327) Update the description of three external API or functions

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351297#comment-16351297 ] Apache Spark commented on SPARK-23327: -- User 'gatorsmile' has created a pull request

[jira] [Created] (SPARK-23327) Update the description of three external API or functions

2018-02-02 Thread Xiao Li (JIRA)
Xiao Li created SPARK-23327: --- Summary: Update the description of three external API or functions Key: SPARK-23327 URL: https://issues.apache.org/jira/browse/SPARK-23327 Project: Spark Issue Type: D

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351260#comment-16351260 ] Reynold Xin commented on SPARK-21658: - I'd revert this one first. I'd even consider t

[jira] [Resolved] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23317. - Resolution: Fixed Fix Version/s: 2.3.0 > rename ContinuousReader.setOffset to setStartOffset > ---

[jira] [Assigned] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23310: Assignee: (was: Apache Spark) > Perf regression introduced by SPARK-21113 > --

[jira] [Assigned] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23310: Assignee: Apache Spark > Perf regression introduced by SPARK-21113 > -

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351207#comment-16351207 ] Apache Spark commented on SPARK-23310: -- User 'sitalkedia' has created a pull request

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-02 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351196#comment-16351196 ] Hyukjin Kwon commented on SPARK-21658: -- [~rxin], this JIRA fixes the signature of an

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351188#comment-16351188 ] Felix Cheung commented on SPARK-23314: -- Thanks. I have isolated this to a different

[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351186#comment-16351186 ] Reynold Xin commented on SPARK-23081: - Scala and Python actually. Sorry I was only co

[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark

2018-02-02 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351166#comment-16351166 ] Hyukjin Kwon commented on SPARK-23081: -- Do you mean both Scala and Python APIs or Py

[jira] [Commented] (SPARK-20090) Add StructType.fieldNames to Python API

2018-02-02 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351159#comment-16351159 ] Hyukjin Kwon commented on SPARK-20090: -- I don't object to add an alias in Scala side

[jira] [Commented] (SPARK-20090) Add StructType.fieldNames to Python API

2018-02-02 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351155#comment-16351155 ] Hyukjin Kwon commented on SPARK-20090: -- I think we deprecated this roughly to rename

[jira] [Commented] (SPARK-23064) Add documentation for stream-stream joins

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351136#comment-16351136 ] Apache Spark commented on SPARK-23064: -- User 'tdas' has created a pull request for t

[jira] [Commented] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351118#comment-16351118 ] Apache Spark commented on SPARK-23326: -- User 'zsxwing' has created a pull request fo

[jira] [Assigned] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23326: Assignee: Apache Spark > "Scheduler Delay" of a task is confusing > --

[jira] [Assigned] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23326: Assignee: (was: Apache Spark) > "Scheduler Delay" of a task is confusing > ---

[jira] [Updated] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-23326: - Description: Run the following code and check the UI {code} sc.makeRDD(1 to 1, 1).foreach { i =>

[jira] [Updated] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-23326: - Environment: (was: Run the following code and check the UI {code} sc.makeRDD(1 to 1, 1).foreac

[jira] [Created] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23326: Summary: "Scheduler Delay" of a task is confusing Key: SPARK-23326 URL: https://issues.apache.org/jira/browse/SPARK-23326 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351096#comment-16351096 ] Apache Spark commented on SPARK-21113: -- User 'sitalkedia' has created a pull request

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351097#comment-16351097 ] Sital Kedia commented on SPARK-23310: - https://github.com/apache/spark/pull/20492 >

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Nicolas Poggi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351071#comment-16351071 ] Nicolas Poggi commented on SPARK-23310: --- [~sitalke...@gmail.com] we have found arou

[jira] [Commented] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-02-02 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351060#comment-16351060 ] Sameer Agarwal commented on SPARK-23324:   Thanks [~eje], this is definitely goi

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351048#comment-16351048 ] Sameer Agarwal commented on SPARK-23310: [~sitalke...@gmail.com] it'd be great if

[jira] [Commented] (SPARK-23325) DataSourceV2 readers should always produce InternalRow.

2018-02-02 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351043#comment-16351043 ] Ryan Blue commented on SPARK-23325: --- [~cloud_fan], FYI. > DataSourceV2 readers should

[jira] [Updated] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23324: -- Labels: documentation kubernetes releasenotes (was: documentation kubernetes release_notes) The relea

[jira] [Created] (SPARK-23325) DataSourceV2 readers should always produce InternalRow.

2018-02-02 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-23325: - Summary: DataSourceV2 readers should always produce InternalRow. Key: SPARK-23325 URL: https://issues.apache.org/jira/browse/SPARK-23325 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-02-02 Thread Erik Erlandson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351020#comment-16351020 ] Erik Erlandson commented on SPARK-23324: cc [~sameer], [~foxish] > Announce new

[jira] [Created] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-02-02 Thread Erik Erlandson (JIRA)
Erik Erlandson created SPARK-23324: -- Summary: Announce new Kubernetes back-end for 2.3 release notes Key: SPARK-23324 URL: https://issues.apache.org/jira/browse/SPARK-23324 Project: Spark Is

[jira] [Commented] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351015#comment-16351015 ] Apache Spark commented on SPARK-23323: -- User 'rdblue' has created a pull request for

[jira] [Assigned] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23323: Assignee: (was: Apache Spark) > DataSourceV2 should use the output commit coordinator.

[jira] [Assigned] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23323: Assignee: Apache Spark > DataSourceV2 should use the output commit coordinator. >

[jira] [Created] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-02 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-23323: - Summary: DataSourceV2 should use the output commit coordinator. Key: SPARK-23323 URL: https://issues.apache.org/jira/browse/SPARK-23323 Project: Spark Issue Type:

[jira] [Updated] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-23321: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-22386 > DataSourceV2 should apply some validatio

[jira] [Created] (SPARK-23322) Launcher handles can miss application updates if application finishes too quickly

2018-02-02 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-23322: -- Summary: Launcher handles can miss application updates if application finishes too quickly Key: SPARK-23322 URL: https://issues.apache.org/jira/browse/SPARK-23322

[jira] [Updated] (SPARK-23053) taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status

2018-02-02 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-23053: - Component/s: Scheduler > taskBinarySerialization and task partitions calculate in > DagScheduler

[jira] [Assigned] (SPARK-22820) Spark 2.3 SQL API audit

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-22820: --- Assignee: Xiao Li > Spark 2.3 SQL API audit > --- > > Key: SPARK

[jira] [Updated] (SPARK-22820) Spark 2.3 SQL API audit

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-22820: Priority: Blocker (was: Major) > Spark 2.3 SQL API audit > --- > > Key

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350971#comment-16350971 ] Li Jin commented on SPARK-23314: Hi [~felixcheung] Thanks for the information. However,

[jira] [Assigned] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23321: Assignee: (was: Apache Spark) > DataSourceV2 should apply some validation when writing

[jira] [Commented] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350967#comment-16350967 ] Apache Spark commented on SPARK-23321: -- User 'rdblue' has created a pull request for

[jira] [Assigned] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23321: Assignee: Apache Spark > DataSourceV2 should apply some validation when writing. > ---

[jira] [Updated] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-23321: -- Summary: DataSourceV2 should apply some validation when writing. (was: DataSourceV2 should apply prepr

[jira] [Created] (SPARK-23321) DataSourceV2 should apply preprocess rules for inserts.

2018-02-02 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-23321: - Summary: DataSourceV2 should apply preprocess rules for inserts. Key: SPARK-23321 URL: https://issues.apache.org/jira/browse/SPARK-23321 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23139) Read eventLog file with mixed encodings

2018-02-02 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350933#comment-16350933 ] Imran Rashid commented on SPARK-23139: -- Apologies if this is a really silly question

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350901#comment-16350901 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:29 PM:

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350901#comment-16350901 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:29 PM:

[jira] [Commented] (SPARK-23290) inadvertent change in handling of DateType when converting to pandas dataframe

2018-02-02 Thread Andre Menck (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350917#comment-16350917 ] Andre Menck commented on SPARK-23290: - Hey [~ueshin] apologies, I tried to come up wi

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350813#comment-16350813 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:15 PM:

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350901#comment-16350901 ] Thomas Graves commented on SPARK-23309: --- I should ask is there a log statement or q

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350900#comment-16350900 ] Thomas Graves commented on SPARK-23309: --- So the last test I did was spark 2.3 with

[jira] [Commented] (SPARK-20425) Support an extended display mode to print a column data per line

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350876#comment-16350876 ] Reynold Xin commented on SPARK-20425: - Hey so I don't think we should be doing multip

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350875#comment-16350875 ] Sital Kedia commented on SPARK-23310: - [~yhuai] - Sorry about introducing the regress

[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read

2018-02-02 Thread Ravi Chittilla (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350871#comment-16350871 ] Ravi Chittilla commented on SPARK-21852: +1 > Empty Parquet Files created as a r

[jira] [Commented] (SPARK-20090) Add StructType.fieldNames to Python API

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350861#comment-16350861 ] Reynold Xin commented on SPARK-20090: - Why would we deprecate this? I'd probably add

[jira] [Commented] (SPARK-23290) inadvertent change in handling of DateType when converting to pandas dataframe

2018-02-02 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350860#comment-16350860 ] Sameer Agarwal commented on SPARK-23290: [~amenck] [~aash] any updates here? > i

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350857#comment-16350857 ] Xiao Li commented on SPARK-23309: - Based on my understanding about what [~tgraves]said ab

[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350852#comment-16350852 ] Reynold Xin commented on SPARK-23081: - Sorry why are we adding things like this? I se

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350846#comment-16350846 ] Dongjoon Hyun commented on SPARK-23309: --- To sum up, the same Hive code (old Hive pa

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350848#comment-16350848 ] Reynold Xin commented on SPARK-21658: - Sorry but I object to this change. Why would w

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350846#comment-16350846 ] Dongjoon Hyun edited comment on SPARK-23309 at 2/2/18 7:35 PM:

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350819#comment-16350819 ] Felix Cheung commented on SPARK-23314: -- Im running python 2 Pandas 0.22.0 Pyarrow 0.

[jira] [Reopened] (SPARK-17859) persist should not impede with spark's ability to perform a broadcast join.

2018-02-02 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fernando Pereira reopened SPARK-17859: -- This bug persists {code:java} SPARK version 2.2.1 SparkSession available as 'spark'. In [

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350813#comment-16350813 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 7:04 PM:

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350813#comment-16350813 ] Thomas Graves commented on SPARK-23309: --- I'm still seeing spark 2.3 slower by about

[jira] [Updated] (SPARK-23288) Incorrect number of written records in structured streaming

2018-02-02 Thread Yuriy Bondaruk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuriy Bondaruk updated SPARK-23288: --- Component/s: SQL > Incorrect number of written records in structured streaming >

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350599#comment-16350599 ] Li Jin commented on SPARK-23314: [~felixcheung], what's the version of pandas you are usi

[jira] [Resolved] (SPARK-23295) Exclude Waring message when generating versions in make-distribution.sh

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-23295. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20469 [https://github.co

[jira] [Assigned] (SPARK-23295) Exclude Waring message when generating versions in make-distribution.sh

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-23295: - Assignee: Kent Yao > Exclude Waring message when generating versions in make-distribution.sh >

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350568#comment-16350568 ] Li Jin commented on SPARK-23314: I am taking a look at this > Pandas grouped udf on data

[jira] [Updated] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-23314: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-22216 > Pandas grouped udf on dataset with timestamp c

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350533#comment-16350533 ] Thomas Graves commented on SPARK-23309: --- Note the schema of "something" here is a "

[jira] [Assigned] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-02-02 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-23253: Assignee: Kent Yao > Only write shuffle temporary index file when there is not an existing

[jira] [Resolved] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-02-02 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-23253. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20422 [https://git

[jira] [Resolved] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-23304. --- Resolution: Invalid > Spark SQL coalesce() against hive not working > ---

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350440#comment-16350440 ] Thomas Graves commented on SPARK-23304: --- ok so I guess by that logic then the coale

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350423#comment-16350423 ] Thomas Graves commented on SPARK-23304: --- it doesn't look like sql("xyz").rdd.partit

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350428#comment-16350428 ] Thomas Graves commented on SPARK-23304: --- well I guess that give you end # of partit

[jira] [Resolved] (SPARK-23312) add a config to turn off vectorized cache reader

2018-02-02 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23312. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20483 [https://githu

[jira] [Updated] (SPARK-23320) RANDOM pseudo environment variable has low resolution under Windows

2018-02-02 Thread Olivier Sannier (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Sannier updated SPARK-23320: Description: Under Windows, spark-submit.bat calls spark-class2.cmd which then runs org.ap

[jira] [Created] (SPARK-23320) RANDOM pseudo environment variable has low resolution under Windows

2018-02-02 Thread Olivier Sannier (JIRA)
Olivier Sannier created SPARK-23320: --- Summary: RANDOM pseudo environment variable has low resolution under Windows Key: SPARK-23320 URL: https://issues.apache.org/jira/browse/SPARK-23320 Project: Sp

[jira] [Commented] (SPARK-23319) Skip PySpark tests for old Pandas and old PyArrow

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350365#comment-16350365 ] Apache Spark commented on SPARK-23319: -- User 'HyukjinKwon' has created a pull reques

[jira] [Assigned] (SPARK-23319) Skip PySpark tests for old Pandas and old PyArrow

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23319: Assignee: Apache Spark > Skip PySpark tests for old Pandas and old PyArrow > -

[jira] [Assigned] (SPARK-23319) Skip PySpark tests for old Pandas and old PyArrow

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23319: Assignee: (was: Apache Spark) > Skip PySpark tests for old Pandas and old PyArrow > --

[jira] [Created] (SPARK-23319) Skip PySpark tests for old Pandas and old PyArrow

2018-02-02 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-23319: Summary: Skip PySpark tests for old Pandas and old PyArrow Key: SPARK-23319 URL: https://issues.apache.org/jira/browse/SPARK-23319 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23269) FP-growth: Provide last transaction for each detected frequent pattern

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350308#comment-16350308 ] Sean Owen commented on SPARK-23269: --- Doesn't this incur similar overhead for every call

[jira] [Commented] (SPARK-23318) FP-growth: WARN FPGrowth: Input data is not cached

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350309#comment-16350309 ] Sean Owen commented on SPARK-23318: --- Yes, a similar change sounds fine. > FP-growth: W

[jira] [Comment Edited] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2018-02-02 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350240#comment-16350240 ] Zoltan Ivanfi edited comment on SPARK-12297 at 2/2/18 12:35 PM: ---

[jira] [Commented] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2018-02-02 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350240#comment-16350240 ] Zoltan Ivanfi commented on SPARK-12297: --- Hive already has a workaround based on a t

[jira] [Created] (SPARK-23318) FP-growth: WARN FPGrowth: Input data is not cached

2018-02-02 Thread Arseniy Tashoyan (JIRA)
Arseniy Tashoyan created SPARK-23318: Summary: FP-growth: WARN FPGrowth: Input data is not cached Key: SPARK-23318 URL: https://issues.apache.org/jira/browse/SPARK-23318 Project: Spark Is

[jira] [Assigned] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23317: Assignee: Apache Spark (was: Wenchen Fan) > rename ContinuousReader.setOffset to setStart

[jira] [Commented] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350145#comment-16350145 ] Apache Spark commented on SPARK-23317: -- User 'cloud-fan' has created a pull request

[jira] [Assigned] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23317: Assignee: Wenchen Fan (was: Apache Spark) > rename ContinuousReader.setOffset to setStart

[jira] [Created] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-23317: --- Summary: rename ContinuousReader.setOffset to setStartOffset Key: SPARK-23317 URL: https://issues.apache.org/jira/browse/SPARK-23317 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350196#comment-16350196 ] Bogdan Raducanu commented on SPARK-23316: - I'll work on a fix > AnalysisExceptio

[jira] [Updated] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated SPARK-23316: Affects Version/s: 2.4.0 > AnalysisException after max iteration reached for IN query > ---

  1   2   >