[jira] [Created] (SPARK-15727) Add UPSERT/MERGE mode to DataFrameWriter

2016-06-01 Thread Ian Hellstrom (JIRA)
Ian Hellstrom created SPARK-15727: - Summary: Add UPSERT/MERGE mode to DataFrameWriter Key: SPARK-15727 URL: https://issues.apache.org/jira/browse/SPARK-15727 Project: Spark Issue Type: Wish

[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2016-06-01 Thread Brett Randall (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311801#comment-15311801 ] Brett Randall commented on SPARK-15685: --- I found an interesting commit for this top

[jira] [Updated] (SPARK-13484) Filter outer joined result using a non-nullable column from the right table

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13484: --- Assignee: Takeshi Yamamuro > Filter outer joined result using a non-nullable column from the right ta

[jira] [Resolved] (SPARK-13484) Filter outer joined result using a non-nullable column from the right table

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-13484. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13290 [https://github.

[jira] [Commented] (SPARK-15617) Clarify that fMeasure in MulticlassMetrics and MulticlassClassificationEvaluator is "micro" f1_score

2016-06-01 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311752#comment-15311752 ] zhengruifeng commented on SPARK-15617: -- Agreed. In {{MulticlassClassificationEvaluat

[jira] [Assigned] (SPARK-14959) ​Problem Reading partitioned ORC or Parquet files

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14959: Assignee: (was: Apache Spark) > ​Problem Reading partitioned ORC or Parquet files > --

[jira] [Commented] (SPARK-14959) ​Problem Reading partitioned ORC or Parquet files

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311748#comment-15311748 ] Apache Spark commented on SPARK-14959: -- User 'xwu0226' has created a pull request fo

[jira] [Assigned] (SPARK-14959) ​Problem Reading partitioned ORC or Parquet files

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14959: Assignee: Apache Spark > ​Problem Reading partitioned ORC or Parquet files > -

[jira] [Updated] (SPARK-15620) Dataset.map creates a dataset that can't be self-joined

2016-06-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15620: Assignee: Saisai Shao > Dataset.map creates a dataset that can't be self-joined > -

[jira] [Resolved] (SPARK-15620) Dataset.map creates a dataset that can't be self-joined

2016-06-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-15620. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13399 [https://githu

[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-01 Thread Brett Randall (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311717#comment-15311717 ] Brett Randall commented on SPARK-15723: --- https://github.com/apache/spark/pull/13462

[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-01 Thread Brett Randall (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311701#comment-15311701 ] Brett Randall commented on SPARK-15723: --- I will propose a fix. Note that to reprod

[jira] [Assigned] (SPARK-15721) Make DefaultParamsReadable,Writable public APIs

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15721: Assignee: Joseph K. Bradley (was: Apache Spark) > Make DefaultParamsReadable,Writable pub

[jira] [Commented] (SPARK-15721) Make DefaultParamsReadable,Writable public APIs

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311687#comment-15311687 ] Apache Spark commented on SPARK-15721: -- User 'jkbradley' has created a pull request

[jira] [Assigned] (SPARK-15721) Make DefaultParamsReadable,Writable public APIs

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15721: Assignee: Apache Spark (was: Joseph K. Bradley) > Make DefaultParamsReadable,Writable pub

[jira] [Assigned] (SPARK-15615) Support for creating a dataframe from JSON in Dataset[String]

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15615: Assignee: (was: Apache Spark) > Support for creating a dataframe from JSON in Dataset[

[jira] [Commented] (SPARK-15615) Support for creating a dataframe from JSON in Dataset[String]

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311686#comment-15311686 ] Apache Spark commented on SPARK-15615: -- User 'pjfanning' has created a pull request

[jira] [Assigned] (SPARK-15615) Support for creating a dataframe from JSON in Dataset[String]

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15615: Assignee: Apache Spark > Support for creating a dataframe from JSON in Dataset[String] >

[jira] [Commented] (SPARK-14804) Graph vertexRDD/EdgeRDD checkpoint results ClassCastException:

2016-06-01 Thread Anderson de Andrade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311681#comment-15311681 ] Anderson de Andrade commented on SPARK-14804: - The proposed PR is hacky. It d

[jira] [Commented] (SPARK-5484) Pregel should checkpoint periodically to avoid StackOverflowError

2016-06-01 Thread Anderson de Andrade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311679#comment-15311679 ] Anderson de Andrade commented on SPARK-5484: How is this not a priority? If we

[jira] [Updated] (SPARK-15726) Make DatasetBenchmark fairer among Dataset, DataFrame and RDD

2016-06-01 Thread Hiroshi Inoue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Inoue updated SPARK-15726: -- Description: DatasetBenchmark compares the performances of RDD, DataFrame and Dataset while ru

[jira] [Commented] (SPARK-15726) Make DatasetBenchmark fairer among Dataset, DataFrame and RDD

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311659#comment-15311659 ] Apache Spark commented on SPARK-15726: -- User 'inouehrs' has created a pull request f

[jira] [Assigned] (SPARK-15726) Make DatasetBenchmark fairer among Dataset, DataFrame and RDD

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15726: Assignee: (was: Apache Spark) > Make DatasetBenchmark fairer among Dataset, DataFrame

[jira] [Assigned] (SPARK-15726) Make DatasetBenchmark fairer among Dataset, DataFrame and RDD

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15726: Assignee: Apache Spark > Make DatasetBenchmark fairer among Dataset, DataFrame and RDD > -

[jira] [Assigned] (SPARK-15717) Cannot perform RDD operations on a checkpointed VertexRDD.

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15717: Assignee: Apache Spark > Cannot perform RDD operations on a checkpointed VertexRDD. >

[jira] [Assigned] (SPARK-15717) Cannot perform RDD operations on a checkpointed VertexRDD.

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15717: Assignee: (was: Apache Spark) > Cannot perform RDD operations on a checkpointed Vertex

[jira] [Commented] (SPARK-15717) Cannot perform RDD operations on a checkpointed VertexRDD.

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311655#comment-15311655 ] Apache Spark commented on SPARK-15717: -- User 'adeandrade' has created a pull request

[jira] [Commented] (SPARK-15663) SparkSession.catalog.listFunctions shouldn't include the list of built-in functions

2016-06-01 Thread Sandeep Singh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311653#comment-15311653 ] Sandeep Singh commented on SPARK-15663: --- I'm adding tests to my PR. > SparkSession

[jira] [Updated] (SPARK-15726) Make DatasetBenchmark fairer among Dataset, DataFrame and RDD

2016-06-01 Thread Hiroshi Inoue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Inoue updated SPARK-15726: -- Description: DatasetBenchmark compares the performances of RDD, DataFrame and Dataset while ru

[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2016-06-01 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311649#comment-15311649 ] Jeff Zhang commented on SPARK-13587: Yes, I focus on yarn mode, I did some test on lo

[jira] [Updated] (SPARK-15717) Cannot perform RDD operations on a checkpointed VertexRDD.

2016-06-01 Thread Anderson de Andrade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anderson de Andrade updated SPARK-15717: Summary: Cannot perform RDD operations on a checkpointed VertexRDD. (was: Cannot c

[jira] [Commented] (SPARK-15720) MLLIB Word2Vec loading large number of vectors in the model results in java.lang.NegativeArraySizeException

2016-06-01 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311647#comment-15311647 ] yuhao yang commented on SPARK-15720: This can only happen when creating a Word2VecMod

[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2016-06-01 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311646#comment-15311646 ] Jeff Zhang commented on SPARK-13587: Thanks [~gbow...@fastmail.co.uk] In my POC, I im

[jira] [Updated] (SPARK-15726) Make DatasetBenchmark fairer among Dataset, DataFrame and RDD

2016-06-01 Thread Hiroshi Inoue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Inoue updated SPARK-15726: -- Summary: Make DatasetBenchmark fairer among Dataset, DataFrame and RDD (was: Make DatasetBench

[jira] [Created] (SPARK-15726) Make DatasetBenchmark more fairer among Dataset, DataFrame and RDD

2016-06-01 Thread Hiroshi Inoue (JIRA)
Hiroshi Inoue created SPARK-15726: - Summary: Make DatasetBenchmark more fairer among Dataset, DataFrame and RDD Key: SPARK-15726 URL: https://issues.apache.org/jira/browse/SPARK-15726 Project: Spark

[jira] [Updated] (SPARK-15725) Dynamic allocation hangs YARN app when executors time out

2016-06-01 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-15725: -- Description: We've had a problem with a dynamic allocation and YARN (since 1.6) where a large stage wi

[jira] [Commented] (SPARK-15725) Dynamic allocation hangs YARN app when executors time out

2016-06-01 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311567#comment-15311567 ] Ryan Blue commented on SPARK-15725: --- I'm linking to a work-around that ensures the AM t

[jira] [Created] (SPARK-15725) Dynamic allocation hangs YARN app when executors time out

2016-06-01 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-15725: - Summary: Dynamic allocation hangs YARN app when executors time out Key: SPARK-15725 URL: https://issues.apache.org/jira/browse/SPARK-15725 Project: Spark Issue Typ

[jira] [Assigned] (SPARK-15722) Wrong data when CTAS specifies schema

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15722: Assignee: Andrew Or (was: Apache Spark) > Wrong data when CTAS specifies schema > ---

[jira] [Commented] (SPARK-15722) Wrong data when CTAS specifies schema

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311528#comment-15311528 ] Apache Spark commented on SPARK-15722: -- User 'andrewor14' has created a pull request

[jira] [Assigned] (SPARK-15722) Wrong data when CTAS specifies schema

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15722: Assignee: Apache Spark (was: Andrew Or) > Wrong data when CTAS specifies schema > ---

[jira] [Updated] (SPARK-15646) When spark.sql.hive.convertCTAS is true, we may still convert the table to a parquet table when TEXTFILE or SEQUENCEFILE is specified.

2016-06-01 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-15646: -- Assignee: Yin Huai > When spark.sql.hive.convertCTAS is true, we may still convert the table to a > pa

[jira] [Resolved] (SPARK-15646) When spark.sql.hive.convertCTAS is true, we may still convert the table to a parquet table when TEXTFILE or SEQUENCEFILE is specified.

2016-06-01 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-15646. --- Resolution: Fixed Fix Version/s: 2.0.0 > When spark.sql.hive.convertCTAS is true, we may still

[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311511#comment-15311511 ] Sean Owen commented on SPARK-15723: --- Is the fix to change the timezone used in the test

[jira] [Assigned] (SPARK-15724) Add benchmarks for performance over wide schemas

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15724: Assignee: Apache Spark > Add benchmarks for performance over wide schemas > --

[jira] [Assigned] (SPARK-15724) Add benchmarks for performance over wide schemas

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15724: Assignee: (was: Apache Spark) > Add benchmarks for performance over wide schemas > ---

[jira] [Commented] (SPARK-15724) Add benchmarks for performance over wide schemas

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311507#comment-15311507 ] Apache Spark commented on SPARK-15724: -- User 'ericl' has created a pull request for

[jira] [Created] (SPARK-15724) Add benchmarks for performance over wide schemas

2016-06-01 Thread Eric Liang (JIRA)
Eric Liang created SPARK-15724: -- Summary: Add benchmarks for performance over wide schemas Key: SPARK-15724 URL: https://issues.apache.org/jira/browse/SPARK-15724 Project: Spark Issue Type: Test

[jira] [Updated] (SPARK-15724) Add benchmarks for performance over wide schemas

2016-06-01 Thread Eric Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Liang updated SPARK-15724: --- Affects Version/s: 2.0.0 > Add benchmarks for performance over wide schemas > ---

[jira] [Updated] (SPARK-15724) Add benchmarks for performance over wide schemas

2016-06-01 Thread Eric Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Liang updated SPARK-15724: --- Description: There are some reported degradations in 2.0 when querying over very wide/nested schemas;

[jira] [Updated] (SPARK-15724) Add benchmarks for performance over wide schemas

2016-06-01 Thread Eric Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Liang updated SPARK-15724: --- Description: There are some reported degradations in 2.0 when querying over very wide / deeply nested

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-01 Thread Jie Huang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311488#comment-15311488 ] Jie Huang commented on SPARK-15393: --- Yes. you are right. it really depends on the use c

[jira] [Updated] (SPARK-15724) Add benchmarks for performance over wide schemas

2016-06-01 Thread Eric Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Liang updated SPARK-15724: --- Component/s: SQL > Add benchmarks for performance over wide schemas > ---

[jira] [Created] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-01 Thread Brett Randall (JIRA)
Brett Randall created SPARK-15723: - Summary: SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name Key: SPARK-15723 URL: https://issues.apache.org/jira/browse/SPARK-15723

[jira] [Created] (SPARK-15722) Wrong data when CTAS specifies schema

2016-06-01 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15722: - Summary: Wrong data when CTAS specifies schema Key: SPARK-15722 URL: https://issues.apache.org/jira/browse/SPARK-15722 Project: Spark Issue Type: Bug Com

[jira] [Updated] (SPARK-15692) Improves the explain output of several physical plans by displaying embedded logical plan in tree style

2016-06-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15692: Assignee: Sean Zhong > Improves the explain output of several physical plans by displaying embedded

[jira] [Resolved] (SPARK-15692) Improves the explain output of several physical plans by displaying embedded logical plan in tree style

2016-06-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-15692. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13433 [https://githu

[jira] [Commented] (SPARK-14959) ​Problem Reading partitioned ORC or Parquet files

2016-06-01 Thread Xin Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311409#comment-15311409 ] Xin Wu commented on SPARK-14959: I can recreate the problem with hdfs location. and I hav

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311363#comment-15311363 ] Cheng Lian commented on SPARK-11153: Yea, right. Can we do it later on master to mini

[jira] [Resolved] (SPARK-15441) dataset outer join seems to return incorrect result

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15441. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13425 [https://github.

[jira] [Created] (SPARK-15721) Make DefaultParamsReadable,Writable public APIs

2016-06-01 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-15721: - Summary: Make DefaultParamsReadable,Writable public APIs Key: SPARK-15721 URL: https://issues.apache.org/jira/browse/SPARK-15721 Project: Spark Iss

[jira] [Reopened] (SPARK-9876) Upgrade parquet-mr to 1.8.1

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reopened SPARK-9876: --- Re-opened this since we just reverted 1.8.1 upgrade for branch-2.0. https://github.com/apache/spark/pull/

[jira] [Resolved] (SPARK-15269) Creating external table leaves empty directory under warehouse directory

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-15269. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13270 [https://github.

[jira] [Updated] (SPARK-15720) MLLIB Word2Vec loading large number of vectors in the model results in java.lang.NegativeArraySizeException

2016-06-01 Thread Rohan G Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohan G Patil updated SPARK-15720: -- Description: While loading a large number of pre-trained vectors into Spark MLLIB's Word2Vec m

[jira] [Created] (SPARK-15720) MLLIB Word2Vec loading large number of vectors in the model results in java.lang.NegativeArraySizeException

2016-06-01 Thread Rohan G Patil (JIRA)
Rohan G Patil created SPARK-15720: - Summary: MLLIB Word2Vec loading large number of vectors in the model results in java.lang.NegativeArraySizeException Key: SPARK-15720 URL: https://issues.apache.org/jira/browse/

[jira] [Updated] (SPARK-15719) Disable writing Parquet summary files by default

2016-06-01 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-15719: - Labels: release_notes releasenotes (was: ) > Disable writing Parquet summary files by default >

[jira] [Updated] (SPARK-15712) Proper temp table support

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-15712: --- Description: For proper temp table support, I am proposing to create a temp dir for every {{SparkSes

[jira] [Assigned] (SPARK-15719) Disable writing Parquet summary files by default

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15719: Assignee: Apache Spark (was: Cheng Lian) > Disable writing Parquet summary files by defau

[jira] [Assigned] (SPARK-15719) Disable writing Parquet summary files by default

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15719: Assignee: Cheng Lian (was: Apache Spark) > Disable writing Parquet summary files by defau

[jira] [Commented] (SPARK-15719) Disable writing Parquet summary files by default

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311272#comment-15311272 ] Apache Spark commented on SPARK-15719: -- User 'liancheng' has created a pull request

[jira] [Commented] (SPARK-15713) Exception using Kafka Streaming: java.lang.NoSuchMethodError: kafka.message.MessageAndMetadata.(Ljava/lang/String;ILkafka/message/Message;JLkafka/serializer/Dec

2016-06-01 Thread Vaibhav Khanduja (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311261#comment-15311261 ] Vaibhav Khanduja commented on SPARK-15713: -- Thanks the problem was with the vers

[jira] [Updated] (SPARK-15140) encoder should make sure input object is not null

2016-06-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15140: Summary: encoder should make sure input object is not null (was: encoder should support null input

[jira] [Updated] (SPARK-15140) encoder should make sure input object is not null

2016-06-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15140: Issue Type: Sub-task (was: Improvement) Parent: SPARK-15631 > encoder should make sure inp

[jira] [Updated] (SPARK-15717) Cannot collect a checkpointed VertexRDD.

2016-06-01 Thread Anderson de Andrade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anderson de Andrade updated SPARK-15717: Description: A checkpointed (materialized) VertexRDD throws the following exception

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2016-06-01 Thread Mark Hamstra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311252#comment-15311252 ] Mark Hamstra commented on SPARK-11153: -- If I am not mistaken, Parquet 1.8.1 and filt

[jira] [Commented] (SPARK-14146) Imported implicits can't be found in Spark REPL in some cases

2016-06-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311242#comment-15311242 ] Shixiong Zhu commented on SPARK-14146: -- [~cloud_fan] https://issues.scala-lang.org/

[jira] [Created] (SPARK-15719) Disable writing Parquet summary files by default

2016-06-01 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-15719: -- Summary: Disable writing Parquet summary files by default Key: SPARK-15719 URL: https://issues.apache.org/jira/browse/SPARK-15719 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311231#comment-15311231 ] Cheng Lian commented on SPARK-11153: Unfortunately we just decided to revert Parquet

[jira] [Commented] (SPARK-14146) Imported implicits can't be found in Spark REPL in some cases

2016-06-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311223#comment-15311223 ] Shixiong Zhu commented on SPARK-14146: -- Another failure case: {code} scala> val x =

[jira] [Commented] (SPARK-15617) Clarify that fMeasure in MulticlassMetrics and MulticlassClassificationEvaluator is "micro" f1_score

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311218#comment-15311218 ] Sean Owen commented on SPARK-15617: --- Personally I support that. I think that the only m

[jira] [Assigned] (SPARK-15714) Fix Flaky Test: o.a.s.scheduler.BlacklistIntegrationSuite

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15714: Assignee: Apache Spark (was: Imran Rashid) > Fix Flaky Test: o.a.s.scheduler.BlacklistInt

[jira] [Resolved] (SPARK-15713) Exception using Kafka Streaming: java.lang.NoSuchMethodError: kafka.message.MessageAndMetadata.(Ljava/lang/String;ILkafka/message/Message;JLkafka/serializer/Deco

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15713. --- Resolution: Not A Problem This means you have mismatched versions of Kafka in your classpath. That is

[jira] [Commented] (SPARK-15714) Fix Flaky Test: o.a.s.scheduler.BlacklistIntegrationSuite

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311216#comment-15311216 ] Apache Spark commented on SPARK-15714: -- User 'squito' has created a pull request for

[jira] [Assigned] (SPARK-15714) Fix Flaky Test: o.a.s.scheduler.BlacklistIntegrationSuite

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15714: Assignee: Imran Rashid (was: Apache Spark) > Fix Flaky Test: o.a.s.scheduler.BlacklistInt

[jira] [Commented] (SPARK-15716) Memory usage keep growing up in Spark Streaming

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311210#comment-15311210 ] Sean Owen commented on SPARK-15716: --- Unless you can show what's taking up memory, I don

[jira] [Assigned] (SPARK-15715) Altering partition storage information doesn't work in Hive

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15715: Assignee: Apache Spark (was: Andrew Or) > Altering partition storage information doesn't

[jira] [Commented] (SPARK-15715) Altering partition storage information doesn't work in Hive

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311201#comment-15311201 ] Apache Spark commented on SPARK-15715: -- User 'andrewor14' has created a pull request

[jira] [Assigned] (SPARK-15715) Altering partition storage information doesn't work in Hive

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15715: Assignee: Andrew Or (was: Apache Spark) > Altering partition storage information doesn't

[jira] [Updated] (SPARK-15716) Memory usage keep growing up in Spark Streaming

2016-06-01 Thread Yan Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Chen updated SPARK-15716: - Description: Code: {code:java} import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text

[jira] [Updated] (SPARK-15716) Memory usage keep growing up in Spark Streaming

2016-06-01 Thread Yan Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Chen updated SPARK-15716: - Priority: Major (was: Critical) > Memory usage keep growing up in Spark Streaming >

[jira] [Updated] (SPARK-15717) Cannot collect a checkpointed VertexRDD.

2016-06-01 Thread Anderson de Andrade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anderson de Andrade updated SPARK-15717: Description: A checkpointed (materialized) VertexRDD throws the following exception

[jira] [Assigned] (SPARK-15718) better error message for writing bucketing data

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15718: Assignee: Apache Spark (was: Wenchen Fan) > better error message for writing bucketing da

[jira] [Assigned] (SPARK-15718) better error message for writing bucketing data

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15718: Assignee: Wenchen Fan (was: Apache Spark) > better error message for writing bucketing da

[jira] [Commented] (SPARK-15718) better error message for writing bucketing data

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311192#comment-15311192 ] Apache Spark commented on SPARK-15718: -- User 'cloud-fan' has created a pull request

[jira] [Updated] (SPARK-15716) Memory usage keep growing up in Spark Streaming

2016-06-01 Thread Yan Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Chen updated SPARK-15716: - Description: Code: {code:java} import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text

[jira] [Created] (SPARK-15718) better error message for writing bucketing data

2016-06-01 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-15718: --- Summary: better error message for writing bucketing data Key: SPARK-15718 URL: https://issues.apache.org/jira/browse/SPARK-15718 Project: Spark Issue Type: Imp

[jira] [Updated] (SPARK-15717) Cannot collect a checkpointed VertexRDD.

2016-06-01 Thread Anderson de Andrade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anderson de Andrade updated SPARK-15717: Description: A checkpointed (materialized) VertexRDD throws the following exception

[jira] [Updated] (SPARK-15717) Cannot collect a checkpointed VertexRDD.

2016-06-01 Thread Anderson de Andrade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anderson de Andrade updated SPARK-15717: Description: A checkpointed (materialized) VertexRDD throws the following exception

[jira] [Updated] (SPARK-15716) Memory usage keep growing up in Spark Streaming

2016-06-01 Thread Yan Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Chen updated SPARK-15716: - Description: Code: {code:java} import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text

[jira] [Updated] (SPARK-15717) Cannot collect a checkpointed VertexRDD.

2016-06-01 Thread Anderson de Andrade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anderson de Andrade updated SPARK-15717: Description: A checkpointed (materialized) VertexRDD throws the following exception

  1   2   3   >