[jira] [Updated] (SPARK-15693) Write schema definition out for file-based data sources to avoid schema inference

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-15693: Summary: Write schema definition out for file-based data sources to avoid schema inference (was: W

[jira] [Commented] (SPARK-15691) Refactor and improve Hive support

2016-06-01 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309438#comment-15309438 ] Yin Huai commented on SPARK-15691: -- I'd add removing HiveMetastoreCatalog as part of the

[jira] [Closed] (SPARK-14998) SparkSQL throw java.lang.ArrayIndexOutOfBoundsException when use Transformation

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-14998. --- Resolution: Cannot Reproduce Closing this as cannot reproduce. [~liyuance] if you can shed more ligh

[jira] [Updated] (SPARK-15691) Refactor and improve Hive support

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-15691: Description: Hive support is important to Spark SQL, as many Spark users use it to read from Hive.

[jira] [Updated] (SPARK-15691) Refactor and improve Hive support

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-15691: Description: Hive support is important to Spark SQL, as many Spark users use it to read from Hive.

[jira] [Commented] (SPARK-15691) Refactor and improve Hive support

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309487#comment-15309487 ] Reynold Xin commented on SPARK-15691: - Updated. > Refactor and improve Hive support

[jira] [Created] (SPARK-15694) Implement ScriptTransformation in sql/core

2016-06-01 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-15694: --- Summary: Implement ScriptTransformation in sql/core Key: SPARK-15694 URL: https://issues.apache.org/jira/browse/SPARK-15694 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-15694) Implement ScriptTransformation in sql/core

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309844#comment-15309844 ] Reynold Xin commented on SPARK-15694: - cc [~tejasp] > Implement ScriptTransformation

[jira] [Updated] (SPARK-15453) Improve join planning for bucketed / sorted tables

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-15453: Target Version/s: 2.1.0 > Improve join planning for bucketed / sorted tables >

[jira] [Created] (SPARK-15695) Add option of codegen for explain of Dataset

2016-06-01 Thread Jeff Zhang (JIRA)
Jeff Zhang created SPARK-15695: -- Summary: Add option of codegen for explain of Dataset Key: SPARK-15695 URL: https://issues.apache.org/jira/browse/SPARK-15695 Project: Spark Issue Type: Improvem

[jira] [Commented] (SPARK-15604) Spark-SQL: Get com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException when runing query_1.sql of TPC-DS

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309857#comment-15309857 ] Reynold Xin commented on SPARK-15604: - I think [~sameerag] has found the problem, and

[jira] [Assigned] (SPARK-15695) Add option of codegen for explain of Dataset

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15695: Assignee: (was: Apache Spark) > Add option of codegen for explain of Dataset > ---

[jira] [Commented] (SPARK-15695) Add option of codegen for explain of Dataset

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309865#comment-15309865 ] Apache Spark commented on SPARK-15695: -- User 'zjffdu' has created a pull request for

[jira] [Assigned] (SPARK-15695) Add option of codegen for explain of Dataset

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15695: Assignee: Apache Spark > Add option of codegen for explain of Dataset > --

[jira] [Closed] (SPARK-15695) Add option of codegen for explain of Dataset

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-15695. --- Resolution: Won't Fix Closing as won't fix. You can already get it via queryExecution. > Add option

[jira] [Commented] (SPARK-15695) Add option of codegen for explain of Dataset

2016-06-01 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309892#comment-15309892 ] Jeff Zhang commented on SPARK-15695: Thanks, found the api. > Add option of codegen

[jira] [Created] (SPARK-15696) Improve `crosstab` to have a consistent column order

2016-06-01 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-15696: - Summary: Improve `crosstab` to have a consistent column order Key: SPARK-15696 URL: https://issues.apache.org/jira/browse/SPARK-15696 Project: Spark Issue

[jira] [Assigned] (SPARK-15696) Improve `crosstab` to have a consistent column order

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15696: Assignee: (was: Apache Spark) > Improve `crosstab` to have a consistent column order

[jira] [Commented] (SPARK-15696) Improve `crosstab` to have a consistent column order

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309921#comment-15309921 ] Apache Spark commented on SPARK-15696: -- User 'dongjoon-hyun' has created a pull requ

[jira] [Assigned] (SPARK-15696) Improve `crosstab` to have a consistent column order

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15696: Assignee: Apache Spark > Improve `crosstab` to have a consistent column order > -

[jira] [Created] (SPARK-15697) [SPARK REPL] unblock some of the useful repl commands.

2016-06-01 Thread Prashant Sharma (JIRA)
Prashant Sharma created SPARK-15697: --- Summary: [SPARK REPL] unblock some of the useful repl commands. Key: SPARK-15697 URL: https://issues.apache.org/jira/browse/SPARK-15697 Project: Spark

[jira] [Updated] (SPARK-15697) [SPARK REPL] unblock some of the useful repl commands.

2016-06-01 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma updated SPARK-15697: Description: "implicits", "javap", "power", "type", "kind", "reset" commands in repl are b

[jira] [Commented] (SPARK-15697) [SPARK REPL] unblock some of the useful repl commands.

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309947#comment-15309947 ] Apache Spark commented on SPARK-15697: -- User 'ScrapCodes' has created a pull request

[jira] [Assigned] (SPARK-15697) [SPARK REPL] unblock some of the useful repl commands.

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15697: Assignee: (was: Apache Spark) > [SPARK REPL] unblock some of the useful repl commands.

[jira] [Commented] (SPARK-15573) Backwards-compatible persistence for spark.ml

2016-06-01 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309946#comment-15309946 ] yuhao yang commented on SPARK-15573: IMO, this looks more like a release task rather

[jira] [Assigned] (SPARK-15697) [SPARK REPL] unblock some of the useful repl commands.

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15697: Assignee: Apache Spark > [SPARK REPL] unblock some of the useful repl commands. >

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-01 Thread Jie Huang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309952#comment-15309952 ] Jie Huang commented on SPARK-15393: --- Is it possible not to write any thing (including _

[jira] [Commented] (SPARK-15573) Backwards-compatible persistence for spark.ml

2016-06-01 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309962#comment-15309962 ] yuhao yang commented on SPARK-15573: Also, we perhaps can have a place to store the g

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-01 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309966#comment-15309966 ] Hyukjin Kwon commented on SPARK-15393: -- But if it does not write anything, it will l

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-01 Thread Jie Huang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309986#comment-15309986 ] Jie Huang commented on SPARK-15393: --- if there is not folder, we should not scan that fo

[jira] [Comment Edited] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-01 Thread Jie Huang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309986#comment-15309986 ] Jie Huang edited comment on SPARK-15393 at 6/1/16 9:20 AM: --- if

[jira] [Commented] (SPARK-15684) Not mask startsWith and endsWith in R

2016-06-01 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309989#comment-15309989 ] Yanbo Liang commented on SPARK-15684: - [~wm624] Yes, this one is similar with read.cs

[jira] [Comment Edited] (SPARK-15684) Not mask startsWith and endsWith in R

2016-06-01 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309989#comment-15309989 ] Yanbo Liang edited comment on SPARK-15684 at 6/1/16 9:23 AM: -

[jira] [Created] (SPARK-15698) Ability to remove old metadata for structure streaming MetadataLog

2016-06-01 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-15698: --- Summary: Ability to remove old metadata for structure streaming MetadataLog Key: SPARK-15698 URL: https://issues.apache.org/jira/browse/SPARK-15698 Project: Spark

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-01 Thread Jie Huang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310001#comment-15310001 ] Jie Huang commented on SPARK-15393: --- E.g., in hive, if we add a location without any pa

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-01 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310023#comment-15310023 ] Hyukjin Kwon commented on SPARK-15393: -- Yes but how can we read the schema back if t

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-01 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310033#comment-15310033 ] Hyukjin Kwon commented on SPARK-15393: -- Hive would be okay because the schemas are s

[jira] [Commented] (SPARK-15582) Support for Groovy closures

2016-06-01 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310096#comment-15310096 ] Steve Loughran commented on SPARK-15582: you are way ahead of me in your groovy s

[jira] [Commented] (SPARK-15582) Support for Groovy closures

2016-06-01 Thread Catalin Alexandru Zamfir (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310135#comment-15310135 ] Catalin Alexandru Zamfir commented on SPARK-15582: -- Thanks for the ACCUM

[jira] [Commented] (SPARK-15666) Join on two tables generated from a same table throwing query analyzer issue

2016-06-01 Thread Manish Kumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310134#comment-15310134 ] Manish Kumar commented on SPARK-15666: -- Can someone please look into this issue? As

[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310204#comment-15310204 ] Sean Owen commented on SPARK-15685: --- I don't think we can install a SecurityManager, si

[jira] [Resolved] (SPARK-15604) Spark-SQL: Get com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException when runing query_1.sql of TPC-DS

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15604. --- Resolution: Duplicate Target Version/s: (was: 2.0.0) I think this is a duplicate of one o

[jira] [Resolved] (SPARK-15664) Replace FileSystem.get(conf) with path.getFileSystem(conf) when removing CheckpointFile in MLlib

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15664. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13408 [https://github.co

[jira] [Resolved] (SPARK-15579) SparkUI: Storage page is empty even if things are cached

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15579. --- Resolution: Not A Problem Target Version/s: (was: 2.0.0) [~andrewor14] obviously reopen t

[jira] [Resolved] (SPARK-15671) performance regression CoalesceRDD large # partitions

2016-06-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-15671. --- Resolution: Duplicate dup of SPARK-15659 > performance regression CoalesceRDD large # partit

[jira] [Updated] (SPARK-15659) Ensure FileSystem is gotten from path in InMemoryCatalog

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-15659: -- Assignee: Saisai Shao > Ensure FileSystem is gotten from path in InMemoryCatalog >

[jira] [Updated] (SPARK-15664) Replace FileSystem.get(conf) with path.getFileSystem(conf) when removing CheckpointFile in MLlib

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-15664: -- Assignee: Lianhui Wang > Replace FileSystem.get(conf) with path.getFileSystem(conf) when removing > Ch

[jira] [Reopened] (SPARK-15671) performance regression CoalesceRDD large # partitions

2016-06-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reopened SPARK-15671: --- > performance regression CoalesceRDD large # partitions > ---

[jira] [Resolved] (SPARK-15659) Ensure FileSystem is gotten from path in InMemoryCatalog

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15659. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13405 [https://github.co

[jira] [Resolved] (SPARK-15683) spark sql local FS spark.sql.warehouse.dir throws on YARN

2016-06-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-15683. --- Resolution: Duplicate > spark sql local FS spark.sql.warehouse.dir throws on YARN > -

[jira] [Resolved] (SPARK-15600) Make local mode as default mode

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15600. --- Resolution: Won't Fix > Make local mode as default mode > --- > >

[jira] [Resolved] (SPARK-14343) Dataframe operations on a partitioned dataset (using partition discovery) return invalid results

2016-06-01 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-14343. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13431 [https://github.

[jira] [Created] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Erik Erlandson (JIRA)
Erik Erlandson created SPARK-15699: -- Summary: Add chi-squared test statistic as a split quality metric for decision trees Key: SPARK-15699 URL: https://issues.apache.org/jira/browse/SPARK-15699 Proje

[jira] [Created] (SPARK-15700) Spark 2.0 dataframes using more driver memory (reading/writing parquet)

2016-06-01 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-15700: - Summary: Spark 2.0 dataframes using more driver memory (reading/writing parquet) Key: SPARK-15700 URL: https://issues.apache.org/jira/browse/SPARK-15700 Project: Sp

[jira] [Commented] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Erik Erlandson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310416#comment-15310416 ] Erik Erlandson commented on SPARK-15699: Proposed PR: https://github.com/apache/s

[jira] [Created] (SPARK-15701) Constant ColumnVector only needs to prepare one capacity

2016-06-01 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-15701: --- Summary: Constant ColumnVector only needs to prepare one capacity Key: SPARK-15701 URL: https://issues.apache.org/jira/browse/SPARK-15701 Project: Spark

[jira] [Assigned] (SPARK-15701) Constant ColumnVector only needs to prepare one capacity

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15701: Assignee: Apache Spark > Constant ColumnVector only needs to prepare one capacity > --

[jira] [Assigned] (SPARK-15701) Constant ColumnVector only needs to prepare one capacity

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15701: Assignee: (was: Apache Spark) > Constant ColumnVector only needs to prepare one capaci

[jira] [Commented] (SPARK-15701) Constant ColumnVector only needs to prepare one capacity

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310428#comment-15310428 ] Apache Spark commented on SPARK-15701: -- User 'viirya' has created a pull request for

[jira] [Commented] (SPARK-15671) performance regression CoalesceRDD large # partitions

2016-06-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310487#comment-15310487 ] Thomas Graves commented on SPARK-15671: --- Note the performance impact is in the 10's

[jira] [Created] (SPARK-15702) Update document programming-guide accumulator section

2016-06-01 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-15702: -- Summary: Update document programming-guide accumulator section Key: SPARK-15702 URL: https://issues.apache.org/jira/browse/SPARK-15702 Project: Spark Issue Type:

[jira] [Comment Edited] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Erik Erlandson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310416#comment-15310416 ] Erik Erlandson edited comment on SPARK-15699 at 6/1/16 3:38 PM: ---

[jira] [Commented] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310499#comment-15310499 ] Apache Spark commented on SPARK-15699: -- User 'erikerlandson' has created a pull requ

[jira] [Assigned] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15699: Assignee: (was: Apache Spark) > Add chi-squared test statistic as a split quality metr

[jira] [Assigned] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15699: Assignee: Apache Spark > Add chi-squared test statistic as a split quality metric for deci

[jira] [Commented] (SPARK-15702) Update document programming-guide accumulator section

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310509#comment-15310509 ] Apache Spark commented on SPARK-15702: -- User 'WeichenXu123' has created a pull reque

[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310508#comment-15310508 ] Apache Spark commented on SPARK-15654: -- User 'maropu' has created a pull request for

[jira] [Assigned] (SPARK-15702) Update document programming-guide accumulator section

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15702: Assignee: Apache Spark > Update document programming-guide accumulator section > -

[jira] [Assigned] (SPARK-15702) Update document programming-guide accumulator section

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15702: Assignee: (was: Apache Spark) > Update document programming-guide accumulator section

[jira] [Assigned] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15654: Assignee: (was: Apache Spark) > Reading gzipped files results in duplicate rows >

[jira] [Assigned] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15654: Assignee: Apache Spark > Reading gzipped files results in duplicate rows > ---

[jira] [Assigned] (SPARK-15671) performance regression CoalesceRDD large # partitions

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15671: Assignee: Apache Spark > performance regression CoalesceRDD large # partitions > -

[jira] [Assigned] (SPARK-15671) performance regression CoalesceRDD large # partitions

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15671: Assignee: (was: Apache Spark) > performance regression CoalesceRDD large # partitions

[jira] [Closed] (SPARK-15212) CSV file reader when read file with first line schema do not filter blank in schema column name

2016-06-01 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu closed SPARK-15212. -- Resolution: Won't Fix > CSV file reader when read file with first line schema do not filter blank in >

[jira] [Commented] (SPARK-15671) performance regression CoalesceRDD large # partitions

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310524#comment-15310524 ] Apache Spark commented on SPARK-15671: -- User 'tgravescs' has created a pull request

[jira] [Assigned] (SPARK-15530) Partitioning discovery logic HadoopFsRelation should use a higher setting of parallelism

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15530: Assignee: (was: Apache Spark) > Partitioning discovery logic HadoopFsRelation should u

[jira] [Commented] (SPARK-15530) Partitioning discovery logic HadoopFsRelation should use a higher setting of parallelism

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310541#comment-15310541 ] Apache Spark commented on SPARK-15530: -- User 'maropu' has created a pull request for

[jira] [Assigned] (SPARK-15530) Partitioning discovery logic HadoopFsRelation should use a higher setting of parallelism

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15530: Assignee: Apache Spark > Partitioning discovery logic HadoopFsRelation should use a higher

[jira] [Commented] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310554#comment-15310554 ] Sean Owen commented on SPARK-15699: --- (if you title your PR with the JIRA number it will

[jira] [Updated] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-15699: -- Target Version/s: (was: 2.0.0) > Add chi-squared test statistic as a split quality metric for decisio

[jira] [Commented] (SPARK-15702) Update document programming-guide accumulator section

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310557#comment-15310557 ] Sean Owen commented on SPARK-15702: --- [~WeichenXu123] we have a big problem with people

[jira] [Resolved] (SPARK-15609) Can't access spark web UI with given randon port

2016-06-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15609. --- Resolution: Not A Problem > Can't access spark web UI with given randon port > -

[jira] [Commented] (SPARK-15684) Not mask startsWith and endsWith in R

2016-06-01 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310566#comment-15310566 ] Shivaram Venkataraman commented on SPARK-15684: --- We just need to match the

[jira] [Commented] (SPARK-15663) SparkSession.catalog.listFunctions shouldn't include the list of built-in functions

2016-06-01 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310570#comment-15310570 ] Takeshi Yamamuro commented on SPARK-15663: -- It's okay to take this? Anybody stil

[jira] [Commented] (SPARK-9876) Upgrade parquet-mr to 1.8.1

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310573#comment-15310573 ] Apache Spark commented on SPARK-9876: - User 'yhuai' has created a pull request for thi

[jira] [Commented] (SPARK-15663) SparkSession.catalog.listFunctions shouldn't include the list of built-in functions

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310577#comment-15310577 ] Reynold Xin commented on SPARK-15663: - Please go for it. Thanks. > SparkSession.c

[jira] [Created] (SPARK-15703) Spark UI doesn't show all tasks as completed when they are

2016-06-01 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-15703: - Summary: Spark UI doesn't show all tasks as completed when they are Key: SPARK-15703 URL: https://issues.apache.org/jira/browse/SPARK-15703 Project: Spark

[jira] [Updated] (SPARK-15703) Spark UI doesn't show all tasks as completed when they are

2016-06-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-15703: -- Attachment: Screen Shot 2016-06-01 at 11.23.48 AM.png Screen Shot 2016-06-01 at

[jira] [Updated] (SPARK-15495) Improve the output of explain for aggregate operator

2016-06-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15495: Assignee: Sean Zhong > Improve the output of explain for aggregate operator > -

[jira] [Issue Comment Deleted] (SPARK-15699) Add chi-squared test statistic as a split quality metric for decision trees

2016-06-01 Thread Erik Erlandson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Erlandson updated SPARK-15699: --- Comment: was deleted (was: Proposed PR: https://github.com/apache/spark/pull/13440) > Add ch

[jira] [Closed] (SPARK-15275) CatalogTable should store sort ordering for sorted columns

2016-06-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-15275. --- Resolution: Later Closing as later. See discussion on pull request for more details. > CatalogTable

[jira] [Created] (SPARK-15704) TungstenAggregate crashes

2016-06-01 Thread Hiroshi Inoue (JIRA)
Hiroshi Inoue created SPARK-15704: - Summary: TungstenAggregate crashes Key: SPARK-15704 URL: https://issues.apache.org/jira/browse/SPARK-15704 Project: Spark Issue Type: Bug Compon

[jira] [Updated] (SPARK-15495) Improve the output of explain for aggregate operator

2016-06-01 Thread Sean Zhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Zhong updated SPARK-15495: --- Description: We should improves the explain output of Aggregator operator to make it more readable.

[jira] [Resolved] (SPARK-15495) Improve the output of explain for aggregate operator

2016-06-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-15495. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13363 [https://githu

[jira] [Assigned] (SPARK-15704) TungstenAggregate crashes

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15704: Assignee: Apache Spark > TungstenAggregate crashes > -- > >

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310650#comment-15310650 ] Apache Spark commented on SPARK-15704: -- User 'inouehrs' has created a pull request f

[jira] [Assigned] (SPARK-15704) TungstenAggregate crashes

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15704: Assignee: (was: Apache Spark) > TungstenAggregate crashes > -

[jira] [Commented] (SPARK-15619) spark builds filling up /tmp

2016-06-01 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310654#comment-15310654 ] Josh Rosen commented on SPARK-15619: What if we overrode {{java.io.tmpdir}} to point

[jira] [Updated] (SPARK-15700) Spark 2.0 dataframes using more memory (reading/writing parquet)

2016-06-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-15700: -- Summary: Spark 2.0 dataframes using more memory (reading/writing parquet) (was: Spark 2.0 data

[jira] [Commented] (SPARK-15700) Spark 2.0 dataframes using more driver memory (reading/writing parquet)

2016-06-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310657#comment-15310657 ] Thomas Graves commented on SPARK-15700: --- It looks like executors are also requiring

  1   2   3   >