[jira] [Created] (SPARK-27291) File source V2: Ignore empty files in load

2019-03-27 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-27291: -- Summary: File source V2: Ignore empty files in load Key: SPARK-27291 URL: https://issues.apache.org/jira/browse/SPARK-27291 Project: Spark Issue Type: Bu

[jira] [Created] (SPARK-27292) Spark Job Fails with Unknown Error writing to S3 from AWS EMR

2019-03-27 Thread Olalekan Elesin (JIRA)
Olalekan Elesin created SPARK-27292: --- Summary: Spark Job Fails with Unknown Error writing to S3 from AWS EMR Key: SPARK-27292 URL: https://issues.apache.org/jira/browse/SPARK-27292 Project: Spark

[jira] [Commented] (SPARK-27052) Using PySpark udf in transform yields NULL values

2019-03-27 Thread Artem Rybin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802563#comment-16802563 ] Artem Rybin commented on SPARK-27052: - [~ueshin], how I understood, you had implemen

[jira] [Commented] (SPARK-27282) Spark incorrect results when using UNION with GROUP BY clause

2019-03-27 Thread Sofia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802581#comment-16802581 ] Sofia commented on SPARK-27282: --- Thanks [~mgaido] for the info !  [~hyukjin.kwon] The Too

[jira] [Resolved] (SPARK-27288) Pruning nested field in complex map key from object serializers

2019-03-27 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-27288. -- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 3.0.0 Resol

[jira] [Created] (SPARK-27293) I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other peopl

2019-03-27 Thread Martin Skauen (JIRA)
Martin Skauen created SPARK-27293: - Summary: I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other people from my class when applying it

[jira] [Created] (SPARK-27294) Multi-cluster Kafka delegation token support

2019-03-27 Thread Gabor Somogyi (JIRA)
Gabor Somogyi created SPARK-27294: - Summary: Multi-cluster Kafka delegation token support Key: SPARK-27294 URL: https://issues.apache.org/jira/browse/SPARK-27294 Project: Spark Issue Type: Im

[jira] [Commented] (SPARK-18262) JSON.org license is now CatX

2019-03-27 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802719#comment-16802719 ] Yuming Wang commented on SPARK-18262: - [~srowen] I'm not sure do we need exclude {{o

[jira] [Commented] (SPARK-27294) Multi-cluster Kafka delegation token support

2019-03-27 Thread Gabor Somogyi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802720#comment-16802720 ] Gabor Somogyi commented on SPARK-27294: --- [~vanzin] [~zsxwing] [~kabhwan] what do y

[jira] [Commented] (SPARK-18262) JSON.org license is now CatX

2019-03-27 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802736#comment-16802736 ] Sean Owen commented on SPARK-18262: --- I don't see org.json:json in the output of mvn de

[jira] [Updated] (SPARK-27294) Multi-cluster Kafka delegation token support

2019-03-27 Thread Gabor Somogyi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Somogyi updated SPARK-27294: -- Description: Kafka delegation token only supports single cluster at the moment. I've created a

[jira] [Commented] (SPARK-18262) JSON.org license is now CatX

2019-03-27 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802752#comment-16802752 ] Yuming Wang commented on SPARK-18262: - Yes. It only happens when running the Hive te

[jira] [Updated] (SPARK-27291) File source V2: Ignore empty files in load

2019-03-27 Thread Gengliang Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-27291: --- Description: In https://github.com/apache/spark/pull/23130, all empty files are excluded fr

[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2019-03-27 Thread Maxime Nannan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802887#comment-16802887 ] Maxime Nannan commented on SPARK-26365: --- I had a similar problem and I've created

[jira] [Commented] (SPARK-20656) Incremental parsing of event logs in SHS

2019-03-27 Thread shahid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802945#comment-16802945 ] shahid commented on SPARK-20656: I would like to work on it. > Incremental parsing of

[jira] [Resolved] (SPARK-27279) Reuse subquery should compare child node only

2019-03-27 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-27279. - Resolution: Fixed Assignee: Adrian Wang Fix Version/s: 3.0.0 > Reuse subquery should com

[jira] [Commented] (SPARK-27292) Spark Job Fails with Unknown Error writing to S3 from AWS EMR

2019-03-27 Thread Eugene Koifman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803033#comment-16803033 ] Eugene Koifman commented on SPARK-27292: It's probably better to ask this on the

[jira] [Commented] (SPARK-27290) remove unneed sort under Aggregate

2019-03-27 Thread Eugene Koifman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803057#comment-16803057 ] Eugene Koifman commented on SPARK-27290: In general implementation of aggregatio

[jira] [Comment Edited] (SPARK-27290) remove unneed sort under Aggregate

2019-03-27 Thread Eugene Koifman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803057#comment-16803057 ] Eugene Koifman edited comment on SPARK-27290 at 3/27/19 4:56 PM: -

[jira] [Assigned] (SPARK-27291) File source V2: Ignore empty files in load

2019-03-27 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-27291: --- Assignee: Gengliang Wang > File source V2: Ignore empty files in load > ---

[jira] [Resolved] (SPARK-27291) File source V2: Ignore empty files in load

2019-03-27 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27291. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24227 [https://gith

[jira] [Created] (SPARK-27295) Provision to provide initial values for each source node in personalised page rank - Graphx

2019-03-27 Thread Eshwar S R (JIRA)
Eshwar S R created SPARK-27295: -- Summary: Provision to provide initial values for each source node in personalised page rank - Graphx Key: SPARK-27295 URL: https://issues.apache.org/jira/browse/SPARK-27295

[jira] [Updated] (SPARK-27295) Provision to provide initial values for each source node in personalised page rank - Graphx

2019-03-27 Thread Eshwar S R (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshwar S R updated SPARK-27295: --- Priority: Major (was: Minor) > Provision to provide initial values for each source node in personal

[jira] [Resolved] (SPARK-24902) Add integration tests for PVs

2019-03-27 Thread shane knapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp resolved SPARK-24902. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23514 [https://gith

[jira] [Assigned] (SPARK-24902) Add integration tests for PVs

2019-03-27 Thread shane knapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp reassigned SPARK-24902: --- Assignee: Stavros Kontopoulos > Add integration tests for PVs > ---

[jira] [Updated] (SPARK-27288) Pruning nested field in complex map key from object serializers

2019-03-27 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27288: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-25603 > Pruning nested field in

[jira] [Commented] (SPARK-27288) Pruning nested field in complex map key from object serializers

2019-03-27 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803367#comment-16803367 ] Dongjoon Hyun commented on SPARK-27288: --- [~viirya] and [~maropu]. To give more vis

[jira] [Created] (SPARK-27296) User Defined Aggregating Functions (UDAFs) have a major efficiency problem

2019-03-27 Thread Erik Erlandson (JIRA)
Erik Erlandson created SPARK-27296: -- Summary: User Defined Aggregating Functions (UDAFs) have a major efficiency problem Key: SPARK-27296 URL: https://issues.apache.org/jira/browse/SPARK-27296 Projec

[jira] [Created] (SPARK-27297) Add higher order functions to org.apache.spark.sql.functions

2019-03-27 Thread Nikolas Vanderhoof (JIRA)
Nikolas Vanderhoof created SPARK-27297: -- Summary: Add higher order functions to org.apache.spark.sql.functions Key: SPARK-27297 URL: https://issues.apache.org/jira/browse/SPARK-27297 Project: Spa

[jira] [Updated] (SPARK-27297) Add higher order functions to org.apache.spark.sql.functions

2019-03-27 Thread Nikolas Vanderhoof (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolas Vanderhoof updated SPARK-27297: --- Description: There is currently no existing Scala API equivalent for the higher orde

[jira] [Updated] (SPARK-27297) Add higher order functions to Scala API

2019-03-27 Thread Nikolas Vanderhoof (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolas Vanderhoof updated SPARK-27297: --- Summary: Add higher order functions to Scala API (was: Add higher order functions t

[jira] [Updated] (SPARK-27297) Add higher order functions to Scala API

2019-03-27 Thread Nikolas Vanderhoof (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolas Vanderhoof updated SPARK-27297: --- Description: There is currently no existing Scala API equivalent for the higher orde

[jira] [Created] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2019-03-27 Thread Mahima Khatri (JIRA)
Mahima Khatri created SPARK-27298: - Summary: Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment Key: SPARK-27298 URL: https://issues.apache.org

[jira] [Updated] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2019-03-27 Thread Mahima Khatri (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahima Khatri updated SPARK-27298: -- Attachment: customer.csv > Dataset except operation gives different results(dataset count) on

[jira] [Updated] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2019-03-27 Thread Mahima Khatri (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahima Khatri updated SPARK-27298: -- Attachment: Console-Result-Windows.txt > Dataset except operation gives different results(data

[jira] [Updated] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2019-03-27 Thread Mahima Khatri (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahima Khatri updated SPARK-27298: -- Attachment: console-result-LinuxonVM.txt > Dataset except operation gives different results(da

[jira] [Updated] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2019-03-27 Thread Mahima Khatri (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahima Khatri updated SPARK-27298: -- Attachment: pom.xml > Dataset except operation gives different results(dataset count) on Spark

[jira] [Updated] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2019-03-27 Thread Mahima Khatri (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahima Khatri updated SPARK-27298: -- Priority: Critical (was: Major) > Dataset except operation gives different results(dataset co