[jira] [Assigned] (SPARK-12760) inaccurate description for difference between local vs cluster mode in closure handling

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12760: Assignee: (was: Apache Spark) > inaccurate description for difference between local vs

[jira] [Assigned] (SPARK-12760) inaccurate description for difference between local vs cluster mode in closure handling

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12760: Assignee: Apache Spark > inaccurate description for difference between local vs cluster mo

[jira] [Commented] (SPARK-12760) inaccurate description for difference between local vs cluster mode in closure handling

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110814#comment-15110814 ] Apache Spark commented on SPARK-12760: -- User 'srowen' has created a pull request for

[jira] [Commented] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110796#comment-15110796 ] Apache Spark commented on SPARK-12932: -- User 'andygrove' has created a pull request

[jira] [Assigned] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12932: Assignee: (was: Apache Spark) > Bad error message with trying to create Dataset from R

[jira] [Assigned] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12932: Assignee: Apache Spark > Bad error message with trying to create Dataset from RDD of Java

[jira] [Commented] (SPARK-10262) Add @Since annotation to ml.attribute

2016-01-21 Thread Tommy Yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110743#comment-15110743 ] Tommy Yu commented on SPARK-10262: -- HI Xiangrui Meng I take a look all class under ml.

[jira] [Updated] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12932: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) > Bad error message with trying

[jira] [Commented] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110786#comment-15110786 ] Sean Owen commented on SPARK-12932: --- OK, do you want to make a PR for that? > Bad erro

[jira] [Commented] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Andy Grove (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110785#comment-15110785 ] Andy Grove commented on SPARK-12932: Here is a pull request to change the error messa

[jira] [Updated] (SPARK-12534) Document missing command line options to Spark properties mapping

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12534: -- Assignee: Felix Cheung (was: Apache Spark) > Document missing command line options to Spark properties

[jira] [Updated] (SPARK-12534) Document missing command line options to Spark properties mapping

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12534: -- Issue Type: Improvement (was: Bug) > Document missing command line options to Spark properties mapping

[jira] [Resolved] (SPARK-12534) Document missing command line options to Spark properties mapping

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12534. --- Resolution: Fixed Fix Version/s: 2.0.0 Resolved by https://github.com/apache/spark/pull/10491

[jira] [Commented] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Andy Grove (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110772#comment-15110772 ] Andy Grove commented on SPARK-12932: After reviewing the code for this, I think it is

[jira] [Comment Edited] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110762#comment-15110762 ] Sean Owen edited comment on SPARK-12741 at 1/21/16 3:26 PM: O

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110762#comment-15110762 ] Sean Owen commented on SPARK-12741: --- OK, that's what you wrote at the outset though. Th

[jira] [Comment Edited] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110709#comment-15110709 ] dileep edited comment on SPARK-12843 at 1/21/16 3:08 PM: - When I

[jira] [Commented] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110709#comment-15110709 ] dileep commented on SPARK-12843: Please see the above Code. We need to make use of cachin

[jira] [Commented] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110724#comment-15110724 ] dileep commented on SPARK-12843: Its not selecting entire records when I put Limit after

[jira] [Comment Edited] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110712#comment-15110712 ] dileep edited comment on SPARK-12843 at 1/21/16 2:59 PM: - Its a c

[jira] [Comment Edited] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110709#comment-15110709 ] dileep edited comment on SPARK-12843 at 1/21/16 2:57 PM: - Please

[jira] [Comment Edited] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110712#comment-15110712 ] dileep edited comment on SPARK-12843 at 1/21/16 2:57 PM: - Its a c

[jira] [Commented] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110712#comment-15110712 ] dileep commented on SPARK-12843: Its a caching issue, while scanning the table need to ca

[jira] [Issue Comment Deleted] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dileep updated SPARK-12843: --- Comment: was deleted (was: public class JavaSparkSQL { public static class Person implements Seria

[jira] [Comment Edited] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110676#comment-15110676 ] Sean Owen edited comment on SPARK-12741 at 1/21/16 2:40 PM: W

[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-21 Thread Emlyn Corrin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110688#comment-15110688 ] Emlyn Corrin commented on SPARK-9740: - How do you use FIRST/LAST from the Java API wit

[jira] [Commented] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110704#comment-15110704 ] dileep commented on SPARK-12843: public class JavaSparkSQL { public static cla

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110701#comment-15110701 ] Sasi commented on SPARK-12741: -- That's not what I meant. I just set an example for each case

[jira] [Comment Edited] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110676#comment-15110676 ] Sean Owen edited comment on SPARK-12741 at 1/21/16 2:40 PM: W

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110676#comment-15110676 ] Sean Owen commented on SPARK-12741: --- Wait, is this what you mean? "select count(*) ..."

[jira] [Commented] (SPARK-2309) Generalize the binary logistic regression into multinomial logistic regression

2016-01-21 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110628#comment-15110628 ] Daniel Darabos commented on SPARK-2309: --- https://github.com/apache/spark/blob/v1.6.0

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110613#comment-15110613 ] Sasi commented on SPARK-12741: -- Addtional update: If I use the following code, then I get t

[jira] [Commented] (SPARK-12954) pyspark API 1.3.0 how we can patitionning by columns

2016-01-21 Thread malouke (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110603#comment-15110603 ] malouke commented on SPARK-12954: - hi sean, where i can ask question ? > pyspark API 1.3

[jira] [Commented] (SPARK-12954) pyspark API 1.3.0 how we can patitionning by columns

2016-01-21 Thread malouke (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110596#comment-15110596 ] malouke commented on SPARK-12954: - ok sorry, > pyspark API 1.3.0 how we can patitionnin

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110584#comment-15110584 ] Sasi commented on SPARK-12741: -- I checked my DB which is Aerospike, and I got the same resul

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110572#comment-15110572 ] Sean Owen commented on SPARK-12741: --- I can't reproduce this. I always get the same coun

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110563#comment-15110563 ] Sasi commented on SPARK-12741: -- If I'm running the following code: {code} dataFrame.where(".

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110553#comment-15110553 ] Sasi commented on SPARK-12741: -- Create new DataFame didn't resolve the issue. I still think

[jira] [Resolved] (SPARK-12954) pyspark API 1.3.0 how we can patitionning by columns

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12954. --- Resolution: Invalid Target Version/s: (was: 1.3.0) [~Malouke] a lot is wrong with this. P

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110523#comment-15110523 ] Sasi commented on SPARK-12741: -- I changed the way I used the DataFrame from my last ticket.

[jira] [Created] (SPARK-12954) pyspark API 1.3.0 how we can patitionning by columns

2016-01-21 Thread malouke (JIRA)
malouke created SPARK-12954: --- Summary: pyspark API 1.3.0 how we can patitionning by columns Key: SPARK-12954 URL: https://issues.apache.org/jira/browse/SPARK-12954 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110511#comment-15110511 ] Sean Owen commented on SPARK-12741: --- I recall from other JIRAs that you're not updating

[jira] [Commented] (SPARK-12843) Spark should avoid scanning all partitions when limit is set

2016-01-21 Thread dileep (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110512#comment-15110512 ] dileep commented on SPARK-12843: I will look in to this issue > Spark should avoid scan

[jira] [Commented] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110503#comment-15110503 ] Sasi commented on SPARK-12741: -- I updated the report, can you verify it again. Thanks! Sasi

[jira] [Updated] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sasi updated SPARK-12741: - Description: Hi, I'm updating my report. I'm working with Spark 1.5.2, (used to be 1.5.0), I have a DataFrame an

[jira] [Updated] (SPARK-12741) DataFrame count method return wrong size.

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sasi updated SPARK-12741: - Description: Hi, I'm updating my report. I'm working with Spark 1.5.2, (used to be 1.5.0), I have a DataFrame an

[jira] [Resolved] (SPARK-12906) LongSQLMetricValue cause memory leak on Spark 1.5.1

2016-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12906. --- Resolution: Duplicate > LongSQLMetricValue cause memory leak on Spark 1.5.1 > ---

[jira] [Commented] (SPARK-6817) DataFrame UDFs in R

2016-01-21 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110359#comment-15110359 ] Sun Rui commented on SPARK-6817: for dapply(), user can call repartition() to set an appro

[jira] [Commented] (SPARK-12906) LongSQLMetricValue cause memory leak on Spark 1.5.1

2016-01-21 Thread Sasi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110354#comment-15110354 ] Sasi commented on SPARK-12906: -- Looks like fixed on 1.5.2. Thanks! > LongSQLMetricValue cau

[jira] [Commented] (SPARK-12953) RDDRelation write set mode will be better to avoid error "pair.parquet already exists"

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110301#comment-15110301 ] Apache Spark commented on SPARK-12953: -- User 'shijinkui' has created a pull request

[jira] [Assigned] (SPARK-12953) RDDRelation write set mode will be better to avoid error "pair.parquet already exists"

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12953: Assignee: Apache Spark > RDDRelation write set mode will be better to avoid error "pair.pa

[jira] [Assigned] (SPARK-12953) RDDRelation write set mode will be better to avoid error "pair.parquet already exists"

2016-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12953: Assignee: (was: Apache Spark) > RDDRelation write set mode will be better to avoid err

[jira] [Updated] (SPARK-12953) RDDRelation write set mode will be better to avoid error "pair.parquet already exists"

2016-01-21 Thread shijinkui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shijinkui updated SPARK-12953: -- Component/s: (was: SQL) Examples > RDDRelation write set mode will be better to av

[jira] [Commented] (SPARK-6817) DataFrame UDFs in R

2016-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110299#comment-15110299 ] Felix Cheung commented on SPARK-6817: - Thanks for putting together on the doc [~sunrui

[jira] [Created] (SPARK-12953) RDDRelation write set mode will be better to avoid error "pair.parquet already exists"

2016-01-21 Thread shijinkui (JIRA)
shijinkui created SPARK-12953: - Summary: RDDRelation write set mode will be better to avoid error "pair.parquet already exists" Key: SPARK-12953 URL: https://issues.apache.org/jira/browse/SPARK-12953 Proj

[jira] [Updated] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general

2016-01-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12247: --- Affects Version/s: (was: 1.5.2) 2.0.0 > Documentation for spark.ml

[jira] [Updated] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general

2016-01-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12247: --- Assignee: Benjamin Fradet > Documentation for spark.ml's ALS and collaborative filtering in g

<    1   2