[jira] [Assigned] (SPARK-26780) Improve shuffle read using ReadAheadInputStream

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26780: Assignee: Apache Spark > Improve shuffle read using ReadAheadInputStream >

[jira] [Assigned] (SPARK-26780) Improve shuffle read using ReadAheadInputStream

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26780: Assignee: (was: Apache Spark) > Improve shuffle read using ReadAheadInputStream > -

[jira] [Created] (SPARK-26780) Improve shuffle read using ReadAheadInputStream

2019-01-29 Thread liuxian (JIRA)
liuxian created SPARK-26780: --- Summary: Improve shuffle read using ReadAheadInputStream Key: SPARK-26780 URL: https://issues.apache.org/jira/browse/SPARK-26780 Project: Spark Issue Type: Improveme

[jira] [Commented] (SPARK-24360) Support Hive 3.1 metastore

2019-01-29 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755770#comment-16755770 ] Dongjoon Hyun commented on SPARK-24360: --- I made a new PR for HMS 3.1. > Support H

[jira] [Comment Edited] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread huanghuai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755762#comment-16755762 ] huanghuai edited comment on SPARK-25420 at 1/30/19 7:13 AM:

[jira] [Resolved] (SPARK-26378) Queries of wide CSV/JSON data slowed after SPARK-26151

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26378. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23336 [https://gi

[jira] [Assigned] (SPARK-26378) Queries of wide CSV/JSON data slowed after SPARK-26151

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-26378: Assignee: Bruce Robbins > Queries of wide CSV/JSON data slowed after SPARK-26151 > --

[jira] [Comment Edited] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread huanghuai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755762#comment-16755762 ] huanghuai edited comment on SPARK-25420 at 1/30/19 7:11 AM:

[jira] [Comment Edited] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread huanghuai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755762#comment-16755762 ] huanghuai edited comment on SPARK-25420 at 1/30/19 7:06 AM:

[jira] [Commented] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread huanghuai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755762#comment-16755762 ] huanghuai commented on SPARK-25420: --- ---  the code to be  

[jira] [Assigned] (SPARK-26768) Remove useless code in BlockManager

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26768: Assignee: Apache Spark > Remove useless code in BlockManager > --

[jira] [Assigned] (SPARK-26768) Remove useless code in BlockManager

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26768: Assignee: (was: Apache Spark) > Remove useless code in BlockManager > ---

[jira] [Commented] (SPARK-26749) spark streaming kafka verison for high version

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755758#comment-16755758 ] Hyukjin Kwon commented on SPARK-26749: -- https://spark.apache.org/community.html >

[jira] [Commented] (SPARK-26699) Dataset column output discrepancies

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755756#comment-16755756 ] Hyukjin Kwon commented on SPARK-26699: -- Please take a look at https://spark.apache.

[jira] [Commented] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755757#comment-16755757 ] Jungtaek Lim commented on SPARK-25420: -- [~jeffrey.mak] Hmm... the result looks odd

[jira] [Commented] (SPARK-26777) SQL worked in 2.3.2 and fails in 2.4.0

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755749#comment-16755749 ] Hyukjin Kwon commented on SPARK-26777: -- Please narrow down the problem, and describ

[jira] [Resolved] (SPARK-26777) SQL worked in 2.3.2 and fails in 2.4.0

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26777. -- Resolution: Incomplete > SQL worked in 2.3.2 and fails in 2.4.0 >

[jira] [Resolved] (SPARK-26779) NullPointerException when disable wholestage codegen

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26779. -- Resolution: Incomplete Please reopen this after filling sufficient details requested > NullPo

[jira] [Commented] (SPARK-26779) NullPointerException when disable wholestage codegen

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755746#comment-16755746 ] Hyukjin Kwon commented on SPARK-26779: -- Please just don't copy and paste. Describe

[jira] [Created] (SPARK-26779) NullPointerException when disable wholestage codegen

2019-01-29 Thread Xiaoju Wu (JIRA)
Xiaoju Wu created SPARK-26779: - Summary: NullPointerException when disable wholestage codegen Key: SPARK-26779 URL: https://issues.apache.org/jira/browse/SPARK-26779 Project: Spark Issue Type: Bu

[jira] [Created] (SPARK-26778) Remove rule `FallbackOrcDataSourceV2` when catalog support of file data source v2 is finished

2019-01-29 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-26778: -- Summary: Remove rule `FallbackOrcDataSourceV2` when catalog support of file data source v2 is finished Key: SPARK-26778 URL: https://issues.apache.org/jira/browse/SPARK-26778

[jira] [Comment Edited] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread Jeffrey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755709#comment-16755709 ] Jeffrey edited comment on SPARK-25420 at 1/30/19 5:59 AM: -- [~ka

[jira] [Updated] (SPARK-26778) Implement file source V2 partitioning

2019-01-29 Thread Gengliang Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-26778: --- Summary: Implement file source V2 partitioning (was: Remove rule `FallbackOrcDataSourceV2`

[jira] [Commented] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread Jeffrey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755709#comment-16755709 ] Jeffrey commented on SPARK-25420: - [~kabhwan] I cannot share the dataset since it is own

[jira] [Created] (SPARK-26777) SQL worked in 2.3.2 and fails in 2.4.0

2019-01-29 Thread Yuri Budilov (JIRA)
Yuri Budilov created SPARK-26777: Summary: SQL worked in 2.3.2 and fails in 2.4.0 Key: SPARK-26777 URL: https://issues.apache.org/jira/browse/SPARK-26777 Project: Spark Issue Type: Bug

[jira] [Assigned] (SPARK-26776) Reduce Py4J communication cost in PySpark's execution barrier check

2019-01-29 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-26776: --- Assignee: Hyukjin Kwon > Reduce Py4J communication cost in PySpark's execution barrier chec

[jira] [Resolved] (SPARK-26776) Reduce Py4J communication cost in PySpark's execution barrier check

2019-01-29 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-26776. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23690 [https://gith

[jira] [Updated] (SPARK-26726) Synchronize the amount of memory used by the broadcast variable to the UI display

2019-01-29 Thread hantiantian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hantiantian updated SPARK-26726: Summary: Synchronize the amount of memory used by the broadcast variable to the UI display (was

[jira] [Assigned] (SPARK-26776) Reduce Py4J communication cost in PySpark's execution barrier check

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26776: Assignee: Apache Spark > Reduce Py4J communication cost in PySpark's execution barrier ch

[jira] [Created] (SPARK-26776) Reduce Py4J communication cost in PySpark's execution barrier check

2019-01-29 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-26776: Summary: Reduce Py4J communication cost in PySpark's execution barrier check Key: SPARK-26776 URL: https://issues.apache.org/jira/browse/SPARK-26776 Project: Spark

[jira] [Commented] (SPARK-26776) Reduce Py4J communication cost in PySpark's execution barrier check

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755605#comment-16755605 ] Apache Spark commented on SPARK-26776: -- User 'HyukjinKwon' has created a pull reque

[jira] [Assigned] (SPARK-26776) Reduce Py4J communication cost in PySpark's execution barrier check

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26776: Assignee: (was: Apache Spark) > Reduce Py4J communication cost in PySpark's execution

[jira] [Commented] (SPARK-26776) Reduce Py4J communication cost in PySpark's execution barrier check

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755602#comment-16755602 ] Apache Spark commented on SPARK-26776: -- User 'HyukjinKwon' has created a pull reque

[jira] [Commented] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755583#comment-16755583 ] Jungtaek Lim commented on SPARK-25420: -- [~jeffrey.mak] Could you provide some data

[jira] [Commented] (SPARK-25420) Dataset.count() every time is different.

2019-01-29 Thread Jeffrey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755546#comment-16755546 ] Jeffrey commented on SPARK-25420: -  [~mgaido] Could you elaborate more why this is not a

[jira] [Commented] (SPARK-26732) Flaky test: SparkContextInfoSuite.getRDDStorageInfo only reports on RDDs that actually persist data

2019-01-29 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755515#comment-16755515 ] Takeshi Yamamuro commented on SPARK-26732: -- Thanks for pinging me, dongjoon! >

[jira] [Commented] (SPARK-26766) Remove the list of filesystems from HadoopDelegationTokenProvider.obtainDelegationTokens

2019-01-29 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755501#comment-16755501 ] Marcelo Vanzin commented on SPARK-26766: The only thing YARN-specific about {{ha

[jira] [Commented] (SPARK-26732) Flaky test: SparkContextInfoSuite.getRDDStorageInfo only reports on RDDs that actually persist data

2019-01-29 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755412#comment-16755412 ] Dongjoon Hyun commented on SPARK-26732: --- cc [~maropu]. > Flaky test: SparkContext

[jira] [Updated] (SPARK-26732) Flaky test: SparkContextInfoSuite.getRDDStorageInfo only reports on RDDs that actually persist data

2019-01-29 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26732: -- Affects Version/s: 2.3.2 2.4.0 > Flaky test: SparkContextInfoSuite.getR

[jira] [Assigned] (SPARK-25035) Replicating disk-stored blocks should avoid memory mapping

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25035: Assignee: (was: Apache Spark) > Replicating disk-stored blocks should avoid memory ma

[jira] [Assigned] (SPARK-25035) Replicating disk-stored blocks should avoid memory mapping

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25035: Assignee: Apache Spark > Replicating disk-stored blocks should avoid memory mapping > ---

[jira] [Updated] (SPARK-26718) Fixed integer overflow in SS kafka rateLimit calculation

2019-01-29 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26718: -- Fix Version/s: 2.4.1 > Fixed integer overflow in SS kafka rateLimit calculation >

[jira] [Updated] (SPARK-26775) Update Jenkins nodes to support local volumes for K8s integration tests

2019-01-29 Thread Stavros Kontopoulos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-26775: Description: Current version of Minikube on test machines does not support properl

[jira] [Created] (SPARK-26775) Update Jenkins nodes to support local volumes for K8s integration tests

2019-01-29 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created SPARK-26775: --- Summary: Update Jenkins nodes to support local volumes for K8s integration tests Key: SPARK-26775 URL: https://issues.apache.org/jira/browse/SPARK-26775

[jira] [Commented] (SPARK-25035) Replicating disk-stored blocks should avoid memory mapping

2019-01-29 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755322#comment-16755322 ] Attila Zsolt Piros commented on SPARK-25035: I am working on this. > Replic

[jira] [Updated] (SPARK-26718) Fixed integer overflow in SS kafka rateLimit calculation

2019-01-29 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26718: -- Summary: Fixed integer overflow in SS kafka rateLimit calculation (was: structured streaming

[jira] [Assigned] (SPARK-26718) structured streaming fetched wrong current offset from kafka

2019-01-29 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-26718: - Assignee: Ryne Yang > structured streaming fetched wrong current offset from kafka > --

[jira] [Resolved] (SPARK-26718) Fixed integer overflow in SS kafka rateLimit calculation

2019-01-29 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-26718. --- Resolution: Fixed Fix Version/s: 3.0.0 This is resolved via https://github.com/apache

[jira] [Commented] (SPARK-25994) SPIP: Property Graphs, Cypher Queries, and Algorithms

2019-01-29 Thread Saikat Kanjilal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755272#comment-16755272 ] Saikat Kanjilal commented on SPARK-25994: - [~mju] I would like to help out on th

[jira] [Created] (SPARK-26774) Document threading concerns in TaskSchedulerImpl

2019-01-29 Thread Imran Rashid (JIRA)
Imran Rashid created SPARK-26774: Summary: Document threading concerns in TaskSchedulerImpl Key: SPARK-26774 URL: https://issues.apache.org/jira/browse/SPARK-26774 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-26765) Avro: Validate input and output schema

2019-01-29 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-26765. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23684 [https://gith

[jira] [Assigned] (SPARK-26765) Avro: Validate input and output schema

2019-01-29 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-26765: --- Assignee: Gengliang Wang > Avro: Validate input and output schema > ---

[jira] [Commented] (SPARK-26752) Multiple aggregate methods in the same column in DataFrame

2019-01-29 Thread Guilherme Beltramini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755156#comment-16755156 ] Guilherme Beltramini commented on SPARK-26752: -- Thanks for the input! I als

[jira] [Updated] (SPARK-26772) YARNHadoopDelegationTokenManager should load ServiceCredentialProviders independently

2019-01-29 Thread Gabor Somogyi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Somogyi updated SPARK-26772: -- Description: YARNHadoopDelegationTokenManager now loads ServiceCredentialProviders in one ste

[jira] [Resolved] (SPARK-26702) Create a test trait for Parquet and Orc test

2019-01-29 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-26702. --- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 3.0.0 This is res

[jira] [Created] (SPARK-26773) Consider alternative base images for Kubernetes

2019-01-29 Thread Ondrej Kokes (JIRA)
Ondrej Kokes created SPARK-26773: Summary: Consider alternative base images for Kubernetes Key: SPARK-26773 URL: https://issues.apache.org/jira/browse/SPARK-26773 Project: Spark Issue Type: I

[jira] [Resolved] (SPARK-26763) Using fileStatus cache when filterPartitions

2019-01-29 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-26763. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23683 [https://gith

[jira] [Resolved] (SPARK-11215) Add multiple columns support to StringIndexer

2019-01-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-11215. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 20146 [https://github.c

[jira] [Updated] (SPARK-26771) Make .unpersist(), .destroy() consistently non-blocking by default

2019-01-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-26771: -- Docs Text: The RDD and DataFrame .unpersist() method, and Broadcast .destroy() method, take an optiona

[jira] [Updated] (SPARK-11215) Add multiple columns support to StringIndexer

2019-01-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-11215: -- Docs Text: When specifying frequencyDesc or frequencyAsc as stringOrderType param in StringIn

[jira] [Assigned] (SPARK-26763) Using fileStatus cache when filterPartitions

2019-01-29 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-26763: --- Assignee: Xianyang Liu > Using fileStatus cache when filterPartitions > ---

[jira] [Commented] (SPARK-24959) Do not invoke the CSV/JSON parser for empty schema

2019-01-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755088#comment-16755088 ] Sean Owen commented on SPARK-24959: --- Looks like this may need to be reverted: https:/

[jira] [Assigned] (SPARK-26771) Make .unpersist(), .destroy() consistently non-blocking by default

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26771: Assignee: Apache Spark (was: Sean Owen) > Make .unpersist(), .destroy() consistently non

[jira] [Assigned] (SPARK-26772) YARNHadoopDelegationTokenManager should load ServiceCredentialProviders independently

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26772: Assignee: Apache Spark > YARNHadoopDelegationTokenManager should load ServiceCredentialPr

[jira] [Assigned] (SPARK-26772) YARNHadoopDelegationTokenManager should load ServiceCredentialProviders independently

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26772: Assignee: (was: Apache Spark) > YARNHadoopDelegationTokenManager should load ServiceC

[jira] [Updated] (SPARK-26772) YARNHadoopDelegationTokenManager should load ServiceCredentialProviders independently

2019-01-29 Thread Gabor Somogyi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Somogyi updated SPARK-26772: -- Component/s: (was: Spark Core) YARN > YARNHadoopDelegationTokenManager sh

[jira] [Assigned] (SPARK-26771) Make .unpersist(), .destroy() consistently non-blocking by default

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26771: Assignee: Sean Owen (was: Apache Spark) > Make .unpersist(), .destroy() consistently non

[jira] [Created] (SPARK-26772) YARNHadoopDelegationTokenManager should load ServiceCredentialProviders independently

2019-01-29 Thread Gabor Somogyi (JIRA)
Gabor Somogyi created SPARK-26772: - Summary: YARNHadoopDelegationTokenManager should load ServiceCredentialProviders independently Key: SPARK-26772 URL: https://issues.apache.org/jira/browse/SPARK-26772

[jira] [Resolved] (SPARK-26728) Make rdd.unpersist blocking configurable

2019-01-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-26728. --- Resolution: Won't Fix Closing this in favor of https://issues.apache.org/jira/browse/SPARK-26728 >

[jira] [Comment Edited] (SPARK-26728) Make rdd.unpersist blocking configurable

2019-01-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755069#comment-16755069 ] Sean Owen edited comment on SPARK-26728 at 1/29/19 2:45 PM:

[jira] [Created] (SPARK-26771) Make .unpersist(), .destroy() consistently non-blocking by default

2019-01-29 Thread Sean Owen (JIRA)
Sean Owen created SPARK-26771: - Summary: Make .unpersist(), .destroy() consistently non-blocking by default Key: SPARK-26771 URL: https://issues.apache.org/jira/browse/SPARK-26771 Project: Spark

[jira] [Commented] (SPARK-26727) CREATE OR REPLACE VIEW query fails with TableAlreadyExistsException

2019-01-29 Thread Bela Kovacs (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755065#comment-16755065 ] Bela Kovacs commented on SPARK-26727: - I could reproduce it with databricks, althoug

[jira] [Updated] (SPARK-26739) Standardized Join Types for DataFrames

2019-01-29 Thread Skyler Lehan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Skyler Lehan updated SPARK-26739: - Description: h3. *Q1.* What are you trying to do? Articulate your objectives using absolutely n

[jira] [Commented] (SPARK-26739) Standardized Join Types for DataFrames

2019-01-29 Thread Skyler Lehan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755056#comment-16755056 ] Skyler Lehan commented on SPARK-26739: -- While constants are possible, they're not i

[jira] [Updated] (SPARK-26766) Remove the list of filesystems from HadoopDelegationTokenProvider.obtainDelegationTokens

2019-01-29 Thread Gabor Somogyi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Somogyi updated SPARK-26766: -- Priority: Minor (was: Major) > Remove the list of filesystems from > HadoopDelegationTokenPr

[jira] [Created] (SPARK-26770) Misleading/unhelpful error message when wrapping a null in an Option

2019-01-29 Thread sam (JIRA)
sam created SPARK-26770: --- Summary: Misleading/unhelpful error message when wrapping a null in an Option Key: SPARK-26770 URL: https://issues.apache.org/jira/browse/SPARK-26770 Project: Spark Issue Typ

[jira] [Resolved] (SPARK-26708) Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan

2019-01-29 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-26708. -- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 Resolved by htt

[jira] [Created] (SPARK-26769) partition prunning in inner join

2019-01-29 Thread nhufas (JIRA)
nhufas created SPARK-26769: -- Summary: partition prunning in inner join Key: SPARK-26769 URL: https://issues.apache.org/jira/browse/SPARK-26769 Project: Spark Issue Type: Improvement Compon

[jira] [Updated] (SPARK-26765) Avro: Validate input and output schema

2019-01-29 Thread Gengliang Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-26765: --- Summary: Avro: Validate input and output schema (was: Implement supportDataType API in Avro

[jira] [Updated] (SPARK-26765) Avro: Validate input and output schema

2019-01-29 Thread Gengliang Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-26765: --- Description: The API supportDataType in FileFormat helps to validate the output/input schema

[jira] [Updated] (SPARK-26768) Remove useless code in BlockManager

2019-01-29 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26768: - Description: Recently, when I was reading some code of `BlockManager.getBlockData`, I found tha

[jira] [Updated] (SPARK-26768) Remove useless code in BlockManager

2019-01-29 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26768: - Attachment: Selection_037.jpg > Remove useless code in BlockManager > --

[jira] [Created] (SPARK-26768) Remove useless code in BlockManager

2019-01-29 Thread liupengcheng (JIRA)
liupengcheng created SPARK-26768: Summary: Remove useless code in BlockManager Key: SPARK-26768 URL: https://issues.apache.org/jira/browse/SPARK-26768 Project: Spark Issue Type: Improvement

[jira] [Comment Edited] (SPARK-26767) Filter on a dropDuplicates dataframe gives inconsistency result

2019-01-29 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754882#comment-16754882 ] Marco Gaido edited comment on SPARK-26767 at 1/29/19 11:13 AM: ---

[jira] [Commented] (SPARK-26767) Filter on a dropDuplicates dataframe gives inconsistency result

2019-01-29 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754882#comment-16754882 ] Marco Gaido commented on SPARK-26767: - IIRC there was a similar JIRA reported. May y

[jira] [Updated] (SPARK-26767) Filter on a dropDuplicates dataframe gives inconsistency result

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26767: - Component/s: (was: Build) SQL > Filter on a dropDuplicates dataframe gives

[jira] [Updated] (SPARK-26767) Filter on a dropDuplicates dataframe gives inconsistency result

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26767: - Priority: Major (was: Blocker) > Filter on a dropDuplicates dataframe gives inconsistency resul

[jira] [Commented] (SPARK-26767) Filter on a dropDuplicates dataframe gives inconsistency result

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754880#comment-16754880 ] Hyukjin Kwon commented on SPARK-26767: -- Please avoid to set Critical+ which is usua

[jira] [Commented] (SPARK-26766) Remove the list of filesystems from HadoopDelegationTokenProvider.obtainDelegationTokens

2019-01-29 Thread Gabor Somogyi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754865#comment-16754865 ] Gabor Somogyi commented on SPARK-26766: --- [~vanzin] I was thinking about your [sug

[jira] [Updated] (SPARK-26767) Filter on a dropDuplicates dataframe gives inconsistency result

2019-01-29 Thread Jeffrey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey updated SPARK-26767: Description: o repeat the problem, (1) create a csv file with records holding same values for a subset of

[jira] [Updated] (SPARK-26767) Filter on a dropDuplicates dataframe gives inconsistency result

2019-01-29 Thread Jeffrey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey updated SPARK-26767: Description: To repeat the problem, (1) create a csv file with records holding same values for a subset o

[jira] [Created] (SPARK-26767) Filter on a dropDuplicates dataframe gives inconsistency result

2019-01-29 Thread Jeffrey (JIRA)
Jeffrey created SPARK-26767: --- Summary: Filter on a dropDuplicates dataframe gives inconsistency result Key: SPARK-26767 URL: https://issues.apache.org/jira/browse/SPARK-26767 Project: Spark Issue

[jira] [Created] (SPARK-26766) Remove the list of filesystems from HadoopDelegationTokenProvider.obtainDelegationTokens

2019-01-29 Thread Gabor Somogyi (JIRA)
Gabor Somogyi created SPARK-26766: - Summary: Remove the list of filesystems from HadoopDelegationTokenProvider.obtainDelegationTokens Key: SPARK-26766 URL: https://issues.apache.org/jira/browse/SPARK-26766

[jira] [Created] (SPARK-26765) Implement supportDataType API in Avro data source

2019-01-29 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-26765: -- Summary: Implement supportDataType API in Avro data source Key: SPARK-26765 URL: https://issues.apache.org/jira/browse/SPARK-26765 Project: Spark Issue T

[jira] [Assigned] (SPARK-26765) Implement supportDataType API in Avro data source

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26765: Assignee: (was: Apache Spark) > Implement supportDataType API in Avro data source > -

[jira] [Assigned] (SPARK-26765) Implement supportDataType API in Avro data source

2019-01-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26765: Assignee: Apache Spark > Implement supportDataType API in Avro data source >

[jira] [Created] (SPARK-26764) [SPIP] Spark Relational Cache

2019-01-29 Thread Adrian Wang (JIRA)
Adrian Wang created SPARK-26764: --- Summary: [SPIP] Spark Relational Cache Key: SPARK-26764 URL: https://issues.apache.org/jira/browse/SPARK-26764 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-26764) [SPIP] Spark Relational Cache

2019-01-29 Thread Adrian Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated SPARK-26764: Attachment: Relational+Cache+SPIP.pdf > [SPIP] Spark Relational Cache > --

[jira] [Resolved] (SPARK-26752) Multiple aggregate methods in the same column in DataFrame

2019-01-29 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26752. -- Resolution: Won't Fix > Multiple aggregate methods in the same column in DataFrame > -

[jira] [Updated] (SPARK-26760) [Spark Incorrect display in SPARK UI Executor Tab when number of cores is 4 and Active Task display as 5 in Executor Tab of SPARK UI]

2019-01-29 Thread ABHISHEK KUMAR GUPTA (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ABHISHEK KUMAR GUPTA updated SPARK-26760: - Summary: [Spark Incorrect display in SPARK UI Executor Tab when number of cores

  1   2   >