[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread huangyu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419798#comment-15419798 ] huangyu commented on SPARK-15044: - Hi, I know it isn't Spark's fault. However I think maybe it's better

[jira] [Closed] (SPARK-16716) calling cache on joined dataframe can lead to data being blanked

2016-08-12 Thread PJ Fanning (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning closed SPARK-16716. -- Resolution: Duplicate This looks like it was fixed by SPARK-16664 > calling cache on joined dataframe

[jira] [Commented] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419768#comment-15419768 ] Barry Becker commented on SPARK-17039: -- I read the comments in SPARK-16462. It looks like it would

[jira] [Commented] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-08-12 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419762#comment-15419762 ] Shivaram Venkataraman commented on SPARK-16519: --- FYI [~aloknsingh] [~clarkfitzg] some of

[jira] [Commented] (SPARK-16975) Spark-2.0.0 unable to infer schema for parquet data written by Spark-1.6.2

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419755#comment-15419755 ] Apache Spark commented on SPARK-16975: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Comment Edited] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Liwei Lin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419730#comment-15419730 ] Liwei Lin edited comment on SPARK-17039 at 8/13/16 12:36 AM: - Thanks

[jira] [Commented] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Liwei Lin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419730#comment-15419730 ] Liwei Lin commented on SPARK-17039: --- Thanks [~barrybecker4] for reporting this. Please also see

[jira] [Assigned] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16519: Assignee: (was: Apache Spark) > Handle SparkR RDD generics that create warnings in R

[jira] [Commented] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419716#comment-15419716 ] Apache Spark commented on SPARK-16519: -- User 'felixcheung' has created a pull request for this

[jira] [Assigned] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16519: Assignee: Apache Spark > Handle SparkR RDD generics that create warnings in R CMD check >

[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

2016-08-12 Thread Matt Sicker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419705#comment-15419705 ] Matt Sicker commented on SPARK-6305: Perhaps [composite

[jira] [Commented] (SPARK-17001) Enable standardScaler to standardize sparse vectors when withMean=True

2016-08-12 Thread Tobi Bosede (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419592#comment-15419592 ] Tobi Bosede commented on SPARK-17001: - This can be implemented in a similar fashion to scikit learn's

[jira] [Commented] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch

2016-08-12 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419571#comment-15419571 ] Shixiong Zhu commented on SPARK-17038: -- Good catch. Could you submit a PR to fix it, please? >

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-12 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419458#comment-15419458 ] Sital Kedia commented on SPARK-16922: - I am using the fix in

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419438#comment-15419438 ] Davies Liu commented on SPARK-16922: I think it's fixed by

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419434#comment-15419434 ] Davies Liu commented on SPARK-16922: [~sitalke...@gmail.com] There are two integer overflow bugs

[jira] [Commented] (SPARK-16716) calling cache on joined dataframe can lead to data being blanked

2016-08-12 Thread PJ Fanning (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419430#comment-15419430 ] PJ Fanning commented on SPARK-16716: I set up an equivalent notebook for spark 2.0 in Databricks

[jira] [Updated] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-12 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sital Kedia updated SPARK-16922: Summary: Query with Broadcast Hash join fails due to executor OOM in Spark 2.0 (was: Query

[jira] [Comment Edited] (SPARK-16922) Query failure due to executor OOM in Spark 2.0

2016-08-12 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419383#comment-15419383 ] Sital Kedia edited comment on SPARK-16922 at 8/12/16 8:06 PM: -- I found that

[jira] [Commented] (SPARK-16922) Query failure due to executor OOM in Spark 2.0

2016-08-12 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419383#comment-15419383 ] Sital Kedia commented on SPARK-16922: - I found that the regression was introduced in

[jira] [Assigned] (SPARK-17045) Moving Auto_Joins from HiveCompatibilitySuite to SQLQueryTestSuite

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17045: Assignee: Apache Spark > Moving Auto_Joins from HiveCompatibilitySuite to

[jira] [Assigned] (SPARK-17045) Moving Auto_Joins from HiveCompatibilitySuite to SQLQueryTestSuite

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17045: Assignee: (was: Apache Spark) > Moving Auto_Joins from HiveCompatibilitySuite to

[jira] [Commented] (SPARK-17045) Moving Auto_Joins from HiveCompatibilitySuite to SQLQueryTestSuite

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419320#comment-15419320 ] Apache Spark commented on SPARK-17045: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Created] (SPARK-17045) Moving Auto_Joins from HiveCompatibilitySuite to SQLQueryTestSuite

2016-08-12 Thread Xiao Li (JIRA)
Xiao Li created SPARK-17045: --- Summary: Moving Auto_Joins from HiveCompatibilitySuite to SQLQueryTestSuite Key: SPARK-17045 URL: https://issues.apache.org/jira/browse/SPARK-17045 Project: Spark

[jira] [Commented] (SPARK-17042) Repl-defined classes cannot be replicated

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419267#comment-15419267 ] Sean Owen commented on SPARK-17042: --- Scala 2.10 or 2.11? I'm pretty sure this is a duplicate. >

[jira] [Resolved] (SPARK-17043) Cannot call zipWithIndex on RDD with more than 200 columns (get wrong result)

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17043. --- Resolution: Duplicate > Cannot call zipWithIndex on RDD with more than 200 columns (get wrong

[jira] [Assigned] (SPARK-17044) Add window function test in SQLQueryTestSuite

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17044: Assignee: (was: Apache Spark) > Add window function test in SQLQueryTestSuite >

[jira] [Commented] (SPARK-17044) Add window function test in SQLQueryTestSuite

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419238#comment-15419238 ] Apache Spark commented on SPARK-17044: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Assigned] (SPARK-17044) Add window function test in SQLQueryTestSuite

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17044: Assignee: Apache Spark > Add window function test in SQLQueryTestSuite >

[jira] [Created] (SPARK-17044) Add window function test in SQLQueryTestSuite

2016-08-12 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-17044: - Summary: Add window function test in SQLQueryTestSuite Key: SPARK-17044 URL: https://issues.apache.org/jira/browse/SPARK-17044 Project: Spark Issue Type:

[jira] [Updated] (SPARK-17042) Repl-defined classes cannot be replicated

2016-08-12 Thread Eric Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Liang updated SPARK-17042: --- Description: A simple fix is to erase the classTag when using the default serializer, since it's

[jira] [Created] (SPARK-17043) Cannot call zipWithIndex on RDD with more than 200 columns (get wrong result)

2016-08-12 Thread Barry Becker (JIRA)
Barry Becker created SPARK-17043: Summary: Cannot call zipWithIndex on RDD with more than 200 columns (get wrong result) Key: SPARK-17043 URL: https://issues.apache.org/jira/browse/SPARK-17043

[jira] [Created] (SPARK-17042) Repl-defined classes cannot be replicated

2016-08-12 Thread Eric Liang (JIRA)
Eric Liang created SPARK-17042: -- Summary: Repl-defined classes cannot be replicated Key: SPARK-17042 URL: https://issues.apache.org/jira/browse/SPARK-17042 Project: Spark Issue Type: Sub-task

[jira] [Resolved] (SPARK-17003) release-build.sh is missing hive-thriftserver for scala 2.11

2016-08-12 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-17003. -- Resolution: Fixed Fix Version/s: 1.6.3 Issue resolved by pull request 14586

[jira] [Commented] (SPARK-17041) Columns in schema are no longer case sensitive when reading csv file

2016-08-12 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419158#comment-15419158 ] Barry Becker commented on SPARK-17041: -- I'm not sure either. How can we find out? I think it would

[jira] [Commented] (SPARK-6235) Address various 2G limits

2016-08-12 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419151#comment-15419151 ] Herman van Hovell commented on SPARK-6235: -- [~gq] it might be a good idea to share some design

[jira] [Resolved] (SPARK-16771) Infinite recursion loop in org.apache.spark.sql.catalyst.trees.TreeNode when table name collides.

2016-08-12 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-16771. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.1.0 >

[jira] [Commented] (SPARK-17041) Columns in schema are no longer case sensitive when reading csv file

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419133#comment-15419133 ] Sean Owen commented on SPARK-17041: --- Behavior changes across major versions. I'm not sure this is a bug

[jira] [Created] (SPARK-17041) Columns in schema are no longer case sensitive when reading csv file

2016-08-12 Thread Barry Becker (JIRA)
Barry Becker created SPARK-17041: Summary: Columns in schema are no longer case sensitive when reading csv file Key: SPARK-17041 URL: https://issues.apache.org/jira/browse/SPARK-17041 Project: Spark

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Artur (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419079#comment-15419079 ] Artur commented on SPARK-15044: --- I don't know what should we do with this issue. The root cause is invalid

[jira] [Commented] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419073#comment-15419073 ] Barry Becker commented on SPARK-17039: -- There are literal ?'s in the datafile. The "nullValue"

[jira] [Updated] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-17039: - Description: I see this exact same bug as reported in this [stack overflow

[jira] [Commented] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419062#comment-15419062 ] Sean Owen commented on SPARK-17039: --- Oh right looked right past that. But if a date is null, converted

[jira] [Commented] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419040#comment-15419040 ] Barry Becker commented on SPARK-17039: -- I do specify a schema (.schema(dfSchema)), and it says that

[jira] [Commented] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419017#comment-15419017 ] Sean Owen commented on SPARK-17039: --- Hm, how are they being parsed as dates -- or is that the issue?

[jira] [Resolved] (SPARK-17040) cannot read null dates from csv file

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17040. --- Resolution: Duplicate > cannot read null dates from csv file >

[jira] [Created] (SPARK-17040) cannot read null dates from csv file

2016-08-12 Thread Barry Becker (JIRA)
Barry Becker created SPARK-17040: Summary: cannot read null dates from csv file Key: SPARK-17040 URL: https://issues.apache.org/jira/browse/SPARK-17040 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-17039) cannot read null dates from csv file

2016-08-12 Thread Barry Becker (JIRA)
Barry Becker created SPARK-17039: Summary: cannot read null dates from csv file Key: SPARK-17039 URL: https://issues.apache.org/jira/browse/SPARK-17039 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-17038) StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch

2016-08-12 Thread Oz Ben-Ami (JIRA)
Oz Ben-Ami created SPARK-17038: -- Summary: StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch Key: SPARK-17038 URL: https://issues.apache.org/jira/browse/SPARK-17038

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread saurabh paliwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418994#comment-15418994 ] saurabh paliwal commented on SPARK-15044: - Hi! so sorry I mixed it up. So the use case should be

[jira] [Updated] (SPARK-16955) Using ordinals in ORDER BY causes an analysis error when the query has a GROUP BY clause using ordinals

2016-08-12 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-16955: - Assignee: Peter Lee > Using ordinals in ORDER BY causes an analysis error when the query has a > GROUP

[jira] [Issue Comment Deleted] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread saurabh paliwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] saurabh paliwal updated SPARK-15044: Comment: was deleted (was: Hi! I am so sorry, I mixed it up. So the premise is towards

[jira] [Comment Edited] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread saurabh paliwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418935#comment-15418935 ] saurabh paliwal edited comment on SPARK-15044 at 8/12/16 2:45 PM: -- Hi! I

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread saurabh paliwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418935#comment-15418935 ] saurabh paliwal commented on SPARK-15044: - Hi! I am so sorry, I mixed it up. So the premise is

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418876#comment-15418876 ] Sean Owen commented on SPARK-15044: --- The partition would still exist in this case, no? > spark-sql

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread saurabh paliwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418872#comment-15418872 ] saurabh paliwal commented on SPARK-15044: - Hi! I agree with Artur. So let's assume there is a

[jira] [Resolved] (SPARK-17037) distinct() operator fails on Dataframe with column names containing periods

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17037. --- Resolution: Duplicate > distinct() operator fails on Dataframe with column names containing periods

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418862#comment-15418862 ] Sean Owen commented on SPARK-15044: --- That's the problem. If the semantics were, "query anything that

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Artur (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418841#comment-15418841 ] Artur commented on SPARK-15044: --- Why does result miss something? If data doesn't exist in hdfs - it should

[jira] [Commented] (SPARK-17036) Hadoop config caching could lead to memory pressure and high CPU usage in thrift server

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418828#comment-15418828 ] Sean Owen commented on SPARK-17036: --- [~rajesh.balamohan] please summarize the issue here. > Hadoop

[jira] [Created] (SPARK-17037) distinct() operator fails on Dataframe with column names containing periods

2016-08-12 Thread Michael Styles (JIRA)
Michael Styles created SPARK-17037: -- Summary: distinct() operator fails on Dataframe with column names containing periods Key: SPARK-17037 URL: https://issues.apache.org/jira/browse/SPARK-17037

[jira] [Commented] (SPARK-12920) Honor "spark.ui.retainedStages" to reduce mem-pressure

2016-08-12 Thread Rajesh Balamohan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418827#comment-15418827 ] Rajesh Balamohan commented on SPARK-12920: -- Thanks [~vanzin] . I have created SPARK-17036 for

[jira] [Created] (SPARK-17036) Hadoop config caching could lead to memory pressure and high CPU usage in thrift server

2016-08-12 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created SPARK-17036: Summary: Hadoop config caching could lead to memory pressure and high CPU usage in thrift server Key: SPARK-17036 URL: https://issues.apache.org/jira/browse/SPARK-17036

[jira] [Resolved] (SPARK-16955) Using ordinals in ORDER BY causes an analysis error when the query has a GROUP BY clause using ordinals

2016-08-12 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-16955. - Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 > Using ordinals in ORDER

[jira] [Commented] (SPARK-16955) Using ordinals in ORDER BY causes an analysis error when the query has a GROUP BY clause using ordinals

2016-08-12 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418817#comment-15418817 ] Wenchen Fan commented on SPARK-16955: - this bug is already fixed by

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418788#comment-15418788 ] Sean Owen commented on SPARK-15044: --- That seems like worse behavior, because it silently 'succeeds'

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Artur (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418784#comment-15418784 ] Artur commented on SPARK-15044: --- I know that you should not do this. But if someone does - spark will not

[jira] [Commented] (SPARK-16917) Spark streaming kafka version compatibility.

2016-08-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418747#comment-15418747 ] Cody Koeninger commented on SPARK-16917: It sounds to me like the documentation is clear, because

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418746#comment-15418746 ] Sean Owen commented on SPARK-15044: --- Is it really an error? the files were manually deleted and

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Artur (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418735#comment-15418735 ] Artur commented on SPARK-15044: --- Tested on spark 2.0.X (master branch) (latest commit:

[jira] [Updated] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread Artur (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur updated SPARK-15044: -- Affects Version/s: 2.0.0 > spark-sql will throw "input path does not exist" exception if it handles a >

[jira] [Updated] (SPARK-17035) Conversion of datetime.max to microseconds produces incorrect value

2016-08-12 Thread Michael Styles (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Styles updated SPARK-17035: --- Description: Conversion of datetime.max to microseconds produces incorrect value. For

[jira] [Updated] (SPARK-17035) Conversion of datetime.max to microseconds produces incorrect value

2016-08-12 Thread Michael Styles (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Styles updated SPARK-17035: --- Priority: Minor (was: Major) > Conversion of datetime.max to microseconds produces

[jira] [Created] (SPARK-17035) Conversion of datetime.max to microseconds produces incorrect value

2016-08-12 Thread Michael Styles (JIRA)
Michael Styles created SPARK-17035: -- Summary: Conversion of datetime.max to microseconds produces incorrect value Key: SPARK-17035 URL: https://issues.apache.org/jira/browse/SPARK-17035 Project:

[jira] [Assigned] (SPARK-17032) Add test cases for methods in ParserUtils

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17032: Assignee: Apache Spark > Add test cases for methods in ParserUtils >

[jira] [Commented] (SPARK-17032) Add test cases for methods in ParserUtils

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418677#comment-15418677 ] Apache Spark commented on SPARK-17032: -- User 'jiangxb1987' has created a pull request for this

[jira] [Assigned] (SPARK-17032) Add test cases for methods in ParserUtils

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17032: Assignee: (was: Apache Spark) > Add test cases for methods in ParserUtils >

[jira] [Assigned] (SPARK-17034) Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17034: Assignee: (was: Apache Spark) > Ordinal in ORDER BY or GROUP BY should be treated as

[jira] [Commented] (SPARK-17034) Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418623#comment-15418623 ] Apache Spark commented on SPARK-17034: -- User 'clockfly' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17034) Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17034: Assignee: Apache Spark > Ordinal in ORDER BY or GROUP BY should be treated as an

[jira] [Created] (SPARK-17034) Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression

2016-08-12 Thread Sean Zhong (JIRA)
Sean Zhong created SPARK-17034: -- Summary: Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression Key: SPARK-17034 URL: https://issues.apache.org/jira/browse/SPARK-17034 Project:

[jira] [Commented] (SPARK-15882) Discuss distributed linear algebra in spark.ml package

2016-08-12 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418585#comment-15418585 ] Jeff Zhang commented on SPARK-15882: I think it is better to keep RDD api underneath as I don't see

[jira] [Commented] (SPARK-14850) VectorUDT/MatrixUDT should take primitive arrays without boxing

2016-08-12 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418526#comment-15418526 ] 胡振宇 commented on SPARK-14850: - I try to run your code on spark1.6.1 but i found that "toDF" cannot be used in

[jira] [Resolved] (SPARK-8717) Update mllib-data-types docs to include missing "matrix" Python examples

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-8717. -- Resolution: Duplicate This was actually a duplicate of an issue already fixed. Look at the docs in

[jira] [Resolved] (SPARK-16598) Added a test case for verifying the table identifier parsing

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16598. --- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14244

[jira] [Resolved] (SPARK-16985) SQL Output maybe overrided

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16985. --- Resolution: Fixed Assignee: Hong Shen Fix Version/s: 2.1.0 Resolved by

[jira] [Resolved] (SPARK-16975) Spark-2.0.0 unable to infer schema for parquet data written by Spark-1.6.2

2016-08-12 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-16975. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Updated] (SPARK-16598) Added a test case for verifying the table identifier parsing

2016-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-16598: -- Assignee: Xiao Li Priority: Minor (was: Major) > Added a test case for verifying the table

[jira] [Comment Edited] (SPARK-14850) VectorUDT/MatrixUDT should take primitive arrays without boxing

2016-08-12 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418547#comment-15418547 ] 胡振宇 edited comment on SPARK-14850 at 8/12/16 9:02 AM: -- /*code is for spark 1.6.1*/

[jira] [Commented] (SPARK-8717) Update mllib-data-types docs to include missing "matrix" Python examples

2016-08-12 Thread Jagadeesan A S (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418553#comment-15418553 ] Jagadeesan A S commented on SPARK-8717: --- I would like to raise PR for this issue. [~srowen] can u

[jira] [Commented] (SPARK-14850) VectorUDT/MatrixUDT should take primitive arrays without boxing

2016-08-12 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418547#comment-15418547 ] 胡振宇 commented on SPARK-14850: - /*code is for spark 1.6.1*/ object Example{ def main (args:Array[String]){

[jira] [Commented] (SPARK-17033) GaussianMixture should use treeAggregate to improve performance

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418537#comment-15418537 ] Apache Spark commented on SPARK-17033: -- User 'yanboliang' has created a pull request for this issue:

[jira] [Commented] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow

2016-08-12 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418071#comment-15418071 ] Maciej Szymkiewicz commented on SPARK-17027: Yes, this exactly the problem. {code}

[jira] [Assigned] (SPARK-17033) GaussianMixture should use treeAggregate to improve performance

2016-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17033: Assignee: (was: Apache Spark) > GaussianMixture should use treeAggregate to improve

[jira] [Updated] (SPARK-17033) GaussianMixture should use treeAggregate to improve performance

2016-08-12 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-17033: Description: {{GaussianMixture}} should use {{treeAggregate}} rather than {{aggregate}} to improve

[jira] [Updated] (SPARK-17033) GaussianMixture should use treeAggregate to improve performance

2016-08-12 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-17033: Description: {{GaussianMixture}} should use {{treeAggregate}} rather than {{aggregate}} to improve

[jira] [Updated] (SPARK-17033) GaussianMixture should use treeAggregate to improve performance

2016-08-12 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-17033: Description: {{GaussianMixture}} should use {{treeAggregate}} rather than {{aggregate}} to improve

[jira] [Created] (SPARK-17033) GaussianMixture should use treeAggregate to improve performance

2016-08-12 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-17033: --- Summary: GaussianMixture should use treeAggregate to improve performance Key: SPARK-17033 URL: https://issues.apache.org/jira/browse/SPARK-17033 Project: Spark

[jira] [Commented] (SPARK-14850) VectorUDT/MatrixUDT should take primitive arrays without boxing

2016-08-12 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418534#comment-15418534 ] Wenchen Fan commented on SPARK-14850: - format your code please, it's unreadable >

[jira] [Updated] (SPARK-17033) GaussianMixture should use treeAggregate to improve performance

2016-08-12 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-17033: Component/s: MLlib ML > GaussianMixture should use treeAggregate to improve

  1   2   >