[jira] [Commented] (SPARK-20898) spark.blacklist.killBlacklistedExecutors doesn't work in YARN

2017-05-30 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030710#comment-16030710 ] Saisai Shao commented on SPARK-20898: - [~tgraves], I addressed this issue in https:/

[jira] [Created] (SPARK-20934) Task is hung at inner join, would work with other kind of joins

2017-05-30 Thread Mohamed Elagamy (JIRA)
Mohamed Elagamy created SPARK-20934: --- Summary: Task is hung at inner join, would work with other kind of joins Key: SPARK-20934 URL: https://issues.apache.org/jira/browse/SPARK-20934 Project: Spark

[jira] [Assigned] (SPARK-20933) when the input parameter is float type for ’round ’ or ‘bround’ ,it can't work well

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20933: Assignee: (was: Apache Spark) > when the input parameter is float type for ’round ’ or

[jira] [Assigned] (SPARK-20933) when the input parameter is float type for ’round ’ or ‘bround’ ,it can't work well

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20933: Assignee: Apache Spark > when the input parameter is float type for ’round ’ or ‘bround’

[jira] [Commented] (SPARK-20933) when the input parameter is float type for ’round ’ or ‘bround’ ,it can't work well

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030706#comment-16030706 ] Apache Spark commented on SPARK-20933: -- User '10110346' has created a pull request f

[jira] [Updated] (SPARK-20933) when the input parameter is float type for ’round ’ or ‘bround’ ,it can't work well

2017-05-30 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-20933: Description: spark-sql>select round(cast(3.1415 as float), 3); spark-sql>3.141 For this case, the result we

[jira] [Created] (SPARK-20933) when the input parameter is float type for ’round ’ or ‘bround’ ,it can't work well

2017-05-30 Thread liuxian (JIRA)
liuxian created SPARK-20933: --- Summary: when the input parameter is float type for ’round ’ or ‘bround’ ,it can't work well Key: SPARK-20933 URL: https://issues.apache.org/jira/browse/SPARK-20933 Project: S

[jira] [Resolved] (SPARK-20865) caching dataset throws "Queries with streaming sources must be executed with writeStream.start()"

2017-05-30 Thread Jacek Laskowski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Laskowski resolved SPARK-20865. - Resolution: Won't Fix Fix Version/s: 2.3.0 2.2.0 {{cache}} is n

[jira] [Assigned] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-30 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman reassigned SPARK-20877: - Assignee: Felix Cheung > Investigate if tests will time out on CRAN > --

[jira] [Resolved] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-30 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-20877. --- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request

[jira] [Updated] (SPARK-20854) extend hint syntax to support any expression, not just identifiers or strings

2017-05-30 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20854: Issue Type: Improvement (was: Bug) > extend hint syntax to support any expression, not just identifiers or

[jira] [Assigned] (SPARK-20854) extend hint syntax to support any expression, not just identifiers or strings

2017-05-30 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-20854: --- Assignee: Bogdan Raducanu Target Version/s: 2.2.0 Priority: Blocker (was: Ma

[jira] [Comment Edited] (SPARK-20199) GradientBoostedTreesModel doesn't have featureSubsetStrategy parameter

2017-05-30 Thread pralabhkumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030629#comment-16030629 ] pralabhkumar edited comment on SPARK-20199 at 5/31/17 4:56 AM:

[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have featureSubsetStrategy parameter

2017-05-30 Thread pralabhkumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030629#comment-16030629 ] pralabhkumar commented on SPARK-20199: -- please review the pull request . https://gi

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20392: Target Version/s: 2.3.0 > Slow performance when calling fit on ML pipeline for dataset with many >

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20392: Priority: Blocker (was: Major) > Slow performance when calling fit on ML pipeline for dataset with

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20392: Issue Type: Improvement (was: Bug) > Slow performance when calling fit on ML pipeline for dataset

[jira] [Reopened] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reopened SPARK-20392: - will re-merge it at the end of Spark 2.3, to avoid conflicts when backporting analyzer related PRs t

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20392: Fix Version/s: (was: 2.3.0) > Slow performance when calling fit on ML pipeline for dataset with

[jira] [Commented] (SPARK-20876) If the input parameter is float type for ceil or floor ,the result is not we expected

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030604#comment-16030604 ] Apache Spark commented on SPARK-20876: -- User '10110346' has created a pull request f

[jira] [Assigned] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20932: Assignee: (was: Apache Spark) > CountVectorizer support handle persistence > -

[jira] [Assigned] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20932: Assignee: Apache Spark > CountVectorizer support handle persistence >

[jira] [Commented] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030601#comment-16030601 ] Apache Spark commented on SPARK-20932: -- User 'zhengruifeng' has created a pull reque

[jira] [Created] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20932: Summary: CountVectorizer support handle persistence Key: SPARK-20932 URL: https://issues.apache.org/jira/browse/SPARK-20932 Project: Spark Issue Type: Improv

[jira] [Issue Comment Deleted] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-20931: Comment: was deleted (was: I'm working on.) > Built-in SQL Function - ABS support string type > --

[jira] [Assigned] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20931: Assignee: Apache Spark > Built-in SQL Function - ABS support string type > ---

[jira] [Assigned] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20931: Assignee: (was: Apache Spark) > Built-in SQL Function - ABS support string type >

[jira] [Commented] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030577#comment-16030577 ] Apache Spark commented on SPARK-20931: -- User 'wangyum' has created a pull request fo

[jira] [Resolved] (SPARK-20275) HistoryServer page shows incorrect complete date of inprogress apps

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20275. - Resolution: Fixed Assignee: Saisai Shao Fix Version/s: 2.2.0 2.

[jira] [Created] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-20931: --- Summary: Built-in SQL Function - ABS support string type Key: SPARK-20931 URL: https://issues.apache.org/jira/browse/SPARK-20931 Project: Spark Issue Type: Sub

[jira] [Commented] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030562#comment-16030562 ] Yuming Wang commented on SPARK-20931: - I'm working on. > Built-in SQL Function - ABS

[jira] [Assigned] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20930: Assignee: Apache Spark > Destroy broadcasted centers after computing cost > -

[jira] [Assigned] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20930: Assignee: (was: Apache Spark) > Destroy broadcasted centers after computing cost > --

[jira] [Commented] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030559#comment-16030559 ] Apache Spark commented on SPARK-20930: -- User 'zhengruifeng' has created a pull reque

[jira] [Created] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20930: Summary: Destroy broadcasted centers after computing cost Key: SPARK-20930 URL: https://issues.apache.org/jira/browse/SPARK-20930 Project: Spark Issue Type:

[jira] [Assigned] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-20213: --- Assignee: Wenchen Fan > DataFrameWriter operations do not show up in SQL tab > -

[jira] [Resolved] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20213. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18064 [https://githu

[jira] [Assigned] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20929: Assignee: Apache Spark (was: Joseph K. Bradley) > LinearSVC should not use shared Param H

[jira] [Assigned] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20929: Assignee: Joseph K. Bradley (was: Apache Spark) > LinearSVC should not use shared Param H

[jira] [Commented] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030437#comment-16030437 ] Apache Spark commented on SPARK-20929: -- User 'jkbradley' has created a pull request

[jira] [Updated] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20929: -- Priority: Minor (was: Major) > LinearSVC should not use shared Param HasThresholds > -

[jira] [Created] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-20929: - Summary: LinearSVC should not use shared Param HasThresholds Key: SPARK-20929 URL: https://issues.apache.org/jira/browse/SPARK-20929 Project: Spark

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-05-30 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030379#comment-16030379 ] Nan Zhu commented on SPARK-20928: - Hi, is there any description on what does it mean? >

[jira] [Resolved] (SPARK-20651) Speed up the new app state listener

2017-05-30 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-20651. Resolution: Won't Do I've done some perf work to make sure live applications don't regress,

[jira] [Closed] (SPARK-2183) Avoid loading/shuffling data twice in self-join query

2017-05-30 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-2183. -- Resolution: Fixed Assignee: Reynold Xin This shouldn't be an issue anymore with reuse exchange in

[jira] [Commented] (SPARK-20178) Improve Scheduler fetch failures

2017-05-30 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030284#comment-16030284 ] Sital Kedia commented on SPARK-20178: - https://github.com/apache/spark/pull/18150 >

[jira] [Resolved] (SPARK-20883) Improve StateStore APIs for efficiency

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-20883. -- Resolution: Fixed Fix Version/s: 2.3.0 > Improve StateStore APIs for efficiency > --

[jira] [Commented] (SPARK-19753) Remove all shuffle files on a host in case of slave lost of fetch failure

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030270#comment-16030270 ] Apache Spark commented on SPARK-19753: -- User 'sitalkedia' has created a pull request

[jira] [Assigned] (SPARK-20894) Error while checkpointing to HDFS

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20894: Assignee: (was: Apache Spark) > Error while checkpointing to HDFS > --

[jira] [Assigned] (SPARK-20894) Error while checkpointing to HDFS

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20894: Assignee: Apache Spark > Error while checkpointing to HDFS > -

[jira] [Commented] (SPARK-20894) Error while checkpointing to HDFS

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030261#comment-16030261 ] Apache Spark commented on SPARK-20894: -- User 'zsxwing' has created a pull request fo

[jira] [Assigned] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20926: Assignee: Apache Spark > Exposure to Guava libraries by directly accessing tableRelationCa

[jira] [Commented] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030257#comment-16030257 ] Apache Spark commented on SPARK-20926: -- User 'rezasafi' has created a pull request f

[jira] [Assigned] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20926: Assignee: (was: Apache Spark) > Exposure to Guava libraries by directly accessing tabl

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030197#comment-16030197 ] Jeffrey Quinn commented on SPARK-20925: --- Apologies, will move to the mailing list n

[jira] [Created] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-05-30 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-20928: Summary: Continuous Processing Mode for Structured Streaming Key: SPARK-20928 URL: https://issues.apache.org/jira/browse/SPARK-20928 Project: Spark I

[jira] [Commented] (SPARK-19236) Add createOrReplaceGlobalTempView

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030165#comment-16030165 ] Apache Spark commented on SPARK-19236: -- User 'gatorsmile' has created a pull request

[jira] [Commented] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030160#comment-16030160 ] Shixiong Zhu commented on SPARK-20894: -- The root issue here is the driver uses the l

[jira] [Updated] (SPARK-20894) Error while checkpointing to HDFS

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20894: - Summary: Error while checkpointing to HDFS (was: Error while checkpointing to HDFS (similar to J

[jira] [Updated] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20894: - Issue Type: Improvement (was: Bug) > Error while checkpointing to HDFS (similar to JIRA SPARK-19

[jira] [Reopened] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-20894: -- > Error while checkpointing to HDFS (similar to JIRA SPARK-19268) > ---

[jira] [Resolved] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20924. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 18146 [https://githu

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030108#comment-16030108 ] Sean Owen commented on SPARK-20925: --- This is better for the mailing list. Spark allocat

[jira] [Commented] (SPARK-19732) DataFrame.fillna() does not work for bools in PySpark

2017-05-30 Thread Ruben Berenguel (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030102#comment-16030102 ] Ruben Berenguel commented on SPARK-19732: - I'll give this a go! > DataFrame.fill

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030089#comment-16030089 ] Jeffrey Quinn commented on SPARK-20925: --- Thanks Sean, Sorry to continue to comment

[jira] [Resolved] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20925. --- Resolution: Not A Problem That doesn't mean the JVM is out of memory; it kind of means the opposite.

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030059#comment-16030059 ] Thomas Graves commented on SPARK-20923: --- taking a quick look at the history of the

[jira] [Comment Edited] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

2017-05-30 Thread Mathieu D (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030025#comment-16030025 ] Mathieu D edited comment on SPARK-18881 at 5/30/17 7:52 PM: J

[jira] [Commented] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

2017-05-30 Thread Mathieu D (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030025#comment-16030025 ] Mathieu D commented on SPARK-18881: --- Just to mention a workaround for those experiencin

[jira] [Commented] (SPARK-19044) PySpark dropna() can fail with AnalysisException

2017-05-30 Thread Ruben Berenguel (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030021#comment-16030021 ] Ruben Berenguel commented on SPARK-19044: - Oh, there's a typo in the "equivalent

[jira] [Commented] (SPARK-20803) KernelDensity.estimate in pyspark.mllib.stat.KernelDensity throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is n

2017-05-30 Thread Bettadapura Srinath Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030004#comment-16030004 ] Bettadapura Srinath Sharma commented on SPARK-20803: In Java, the (co

[jira] [Created] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming

2017-05-30 Thread Jacek Laskowski (JIRA)
Jacek Laskowski created SPARK-20927: --- Summary: Add cache operator to Unsupported Operations in Structured Streaming Key: SPARK-20927 URL: https://issues.apache.org/jira/browse/SPARK-20927 Project:

[jira] [Commented] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Reza Safi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029996#comment-16029996 ] Reza Safi commented on SPARK-20926: --- I will post a pull request for this issue soon, by

[jira] [Updated] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Quinn updated SPARK-20925: -- Description: Observed under the following conditions: Spark Version: Spark 2.1.0 Hadoop Versio

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029993#comment-16029993 ] Jeffrey Quinn commented on SPARK-20925: --- Hi Sean, Sorry for not providing adequate

[jira] [Created] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Reza Safi (JIRA)
Reza Safi created SPARK-20926: - Summary: Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures Key: SPARK-20926 URL: https://issues.apache.org/jira/browse/SPARK-20926

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029984#comment-16029984 ] Sean Owen commented on SPARK-20925: --- Not enough info here -- is the JVM running out of

[jira] [Updated] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Quinn updated SPARK-20925: -- Description: Observed under the following conditions: Spark Version: Spark 2.1.0 Hadoop Versio

[jira] [Created] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
Jeffrey Quinn created SPARK-20925: - Summary: Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy Key: SPARK-20925 URL: https://issues.apache.org/jira/browse/SPARK-20925 Project:

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029962#comment-16029962 ] Josh Rosen commented on SPARK-20923: It doesn't seem to be used, as far as I can tell

[jira] [Assigned] (SPARK-20333) Fix HashPartitioner in DAGSchedulerSuite

2017-05-30 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-20333: Assignee: jin xing > Fix HashPartitioner in DAGSchedulerSuite > --

[jira] [Commented] (SPARK-20802) kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not n

2017-05-30 Thread Bettadapura Srinath Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029950#comment-16029950 ] Bettadapura Srinath Sharma commented on SPARK-20802: In Java, (Correc

[jira] [Resolved] (SPARK-20333) Fix HashPartitioner in DAGSchedulerSuite

2017-05-30 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-20333. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 17634 [https://git

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029917#comment-16029917 ] Thomas Graves commented on SPARK-20923: --- [~joshrosen] [~zsxwing] [~eseyfe] I think

[jira] [Closed] (SPARK-15905) Driver hung while writing to console progress bar

2017-05-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed SPARK-15905. --- Resolution: Cannot Reproduce > Driver hung while writing to console progress bar > --

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029847#comment-16029847 ] Ryan Blue commented on SPARK-20923: --- I didn't look at the code path up to writing histo

[jira] [Commented] (SPARK-15905) Driver hung while writing to console progress bar

2017-05-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029846#comment-16029846 ] Tejas Patil commented on SPARK-15905: - I haven't seen this in a while with Spark 2.0.

[jira] [Updated] (SPARK-20597) KafkaSourceProvider falls back on path as synonym for topic

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20597: - Labels: starter (was: ) > KafkaSourceProvider falls back on path as synonym for topic >

[jira] [Updated] (SPARK-20599) ConsoleSink should work with write (batch)

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20599: - Labels: starter (was: ) > ConsoleSink should work with write (batch) > -

[jira] [Updated] (SPARK-20919) Simplificaiton of CachedKafkaConsumer using guava cache.

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20919: - Affects Version/s: (was: 2.3.0) 2.2.0 Target Version/s: 2.3.0 > S

[jira] [Updated] (SPARK-20919) Simplificaiton of CachedKafkaConsumer using guava cache.

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20919: - Issue Type: Improvement (was: Bug) > Simplificaiton of CachedKafkaConsumer using guava cache. >

[jira] [Assigned] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20924: Assignee: Apache Spark (was: Xiao Li) > Unable to call the function registered in the not

[jira] [Assigned] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20924: Assignee: Xiao Li (was: Apache Spark) > Unable to call the function registered in the not

[jira] [Commented] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029815#comment-16029815 ] Apache Spark commented on SPARK-20924: -- User 'gatorsmile' has created a pull request

[jira] [Created] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Xiao Li (JIRA)
Xiao Li created SPARK-20924: --- Summary: Unable to call the function registered in the not-current database Key: SPARK-20924 URL: https://issues.apache.org/jira/browse/SPARK-20924 Project: Spark Iss

[jira] [Commented] (SPARK-20832) Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs

2017-05-30 Thread Jiang Xingbo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029778#comment-16029778 ] Jiang Xingbo commented on SPARK-20832: -- I'm working on this. > Standalone master sh

[jira] [Updated] (SPARK-20899) PySpark supports stringIndexerOrderType in RFormula

2017-05-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-20899: Component/s: ML > PySpark supports stringIndexerOrderType in RFormula > ---

[jira] [Resolved] (SPARK-20899) PySpark supports stringIndexerOrderType in RFormula

2017-05-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang resolved SPARK-20899. - Resolution: Fixed Assignee: Wayne Zhang Fix Version/s: 2.3.0 > PySpark supports s

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029701#comment-16029701 ] Thomas Graves commented on SPARK-20923: --- [~rdblue] with SPARK-20084, did you see a

[jira] [Created] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-20923: - Summary: TaskMetrics._updatedBlockStatuses uses a lot of memory Key: SPARK-20923 URL: https://issues.apache.org/jira/browse/SPARK-20923 Project: Spark Issu

  1   2   >