[jira] [Commented] (SPARK-17969) I think it's user unfriendly to process standard json file with DataFrame

2016-10-16 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581273#comment-15581273 ] Reynold Xin commented on SPARK-17969: - +1 It would be good to have a mode in which each file is a

[jira] [Updated] (SPARK-17969) I think it's user unfriendly to process standard json file with DataFrame

2016-10-16 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17969: Assignee: (was: Reynold Xin) > I think it's user unfriendly to process standard json file with

[jira] [Assigned] (SPARK-17969) I think it's user unfriendly to process standard json file with DataFrame

2016-10-16 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin reassigned SPARK-17969: --- Assignee: Reynold Xin > I think it's user unfriendly to process standard json file with

[jira] [Comment Edited] (SPARK-11524) Support SparkR with Mesos cluster

2016-10-16 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581268#comment-15581268 ] Sun Rui edited comment on SPARK-11524 at 10/17/16 5:31 AM: --- for cluster mode,

[jira] [Commented] (SPARK-11524) Support SparkR with Mesos cluster

2016-10-16 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581268#comment-15581268 ] Sun Rui commented on SPARK-11524: - for cluster mode, the R script needs to transferred to the slave node

[jira] [Commented] (SPARK-10590) Spark with YARN build is broken

2016-10-16 Thread Nirman Narang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581244#comment-15581244 ] Nirman Narang commented on SPARK-10590: --- Hi Sean, my FS is not encrypted. I upgraded Maven to 3.3.9

[jira] [Updated] (SPARK-17819) Specified database in JDBC URL is ignored when connecting to thriftserver

2016-10-16 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17819: Fix Version/s: 2.0.2 > Specified database in JDBC URL is ignored when connecting to thriftserver >

[jira] [Comment Edited] (SPARK-17954) FetchFailedException executor cannot connect to another worker executor

2016-10-16 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581147#comment-15581147 ] Vitaly Gerasimov edited comment on SPARK-17954 at 10/17/16 4:50 AM: I

[jira] [Commented] (SPARK-10915) Add support for UDAFs in Python

2016-10-16 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581167#comment-15581167 ] Reynold Xin commented on SPARK-10915: - It is indeed very complicated to implement UDAF in Python.

[jira] [Comment Edited] (SPARK-17954) FetchFailedException executor cannot connect to another worker executor

2016-10-16 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581147#comment-15581147 ] Vitaly Gerasimov edited comment on SPARK-17954 at 10/17/16 4:12 AM: I

[jira] [Commented] (SPARK-17954) FetchFailedException executor cannot connect to another worker executor

2016-10-16 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581147#comment-15581147 ] Vitaly Gerasimov commented on SPARK-17954: -- I figured out this issue. The problem is spark

[jira] [Resolved] (SPARK-17947) Document the impact of `spark.sql.debug`

2016-10-16 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-17947. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15494

[jira] [Updated] (SPARK-17947) Document the impact of `spark.sql.debug`

2016-10-16 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-17947: Assignee: Xiao Li > Document the impact of `spark.sql.debug` >

[jira] [Commented] (SPARK-17819) Specified database in JDBC URL is ignored when connecting to thriftserver

2016-10-16 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581130#comment-15581130 ] Apache Spark commented on SPARK-17819: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Commented] (SPARK-10915) Add support for UDAFs in Python

2016-10-16 Thread Tobi Bosede (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581127#comment-15581127 ] Tobi Bosede commented on SPARK-10915: - It is complicated to implement a UDAF in python? If you read

[jira] [Commented] (SPARK-17930) The SerializerInstance instance used when deserializing a TaskResult is not reused

2016-10-16 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581119#comment-15581119 ] Guoqiang Li commented on SPARK-17930: - TPC-DS 2T data (Parquet) and the SQL(query 2) => {noformat}

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2016-10-16 Thread Low Chin Wei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581113#comment-15581113 ] Low Chin Wei commented on SPARK-13747: -- It is running on Akka, with forkjoin dispatcher. There are 2

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2016-10-16 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581094#comment-15581094 ] Shixiong Zhu commented on SPARK-13747: -- Could you post the full stack track, please? It would be

[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581072#comment-15581072 ] Hyukjin Kwon commented on SPARK-17963: -- Thanks. Then, I will work on this. > Add examples (extend)

[jira] [Created] (SPARK-17969) I think it's user unfriendly to process standard json file with DataFrame

2016-10-16 Thread Jianfei Wang (JIRA)
Jianfei Wang created SPARK-17969: Summary: I think it's user unfriendly to process standard json file with DataFrame Key: SPARK-17969 URL: https://issues.apache.org/jira/browse/SPARK-17969 Project:

[jira] [Resolved] (SPARK-17819) Specified database in JDBC URL is ignored when connecting to thriftserver

2016-10-16 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-17819. - Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.1.0 > Specified

[jira] [Commented] (SPARK-17819) Specified database in JDBC URL is ignored when connecting to thriftserver

2016-10-16 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581049#comment-15581049 ] Dongjoon Hyun commented on SPARK-17819: --- Hi, [~rxin]. Could you review the PR? In fact, it was an

[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581022#comment-15581022 ] Reynold Xin commented on SPARK-17963: - It's definitely useful to do, but I don't think we need to

[jira] [Commented] (SPARK-10915) Add support for UDAFs in Python

2016-10-16 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581031#comment-15581031 ] Reynold Xin commented on SPARK-10915: - What's the use case? Is it not possible to just run

[jira] [Commented] (SPARK-11524) Support SparkR with Mesos cluster

2016-10-16 Thread Susan X. Huynh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581023#comment-15581023 ] Susan X. Huynh commented on SPARK-11524: Thanks for the advice and for breaking down the

[jira] [Created] (SPARK-17968) Support using 3rd-party R packages on Mesos

2016-10-16 Thread Sun Rui (JIRA)
Sun Rui created SPARK-17968: --- Summary: Support using 3rd-party R packages on Mesos Key: SPARK-17968 URL: https://issues.apache.org/jira/browse/SPARK-17968 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2016-10-16 Thread Low Chin Wei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580989#comment-15580989 ] Low Chin Wei commented on SPARK-13747: -- java.lang.IllegalArgumentException: spark.sql.execution.id

[jira] [Commented] (SPARK-11524) Support SparkR with Mesos cluster

2016-10-16 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580960#comment-15580960 ] Sun Rui commented on SPARK-11524: - great, go ahead. Please look at the linking JIRA issues as references.

[jira] [Created] (SPARK-17967) Support for list or other types as an option for datasources

2016-10-16 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-17967: Summary: Support for list or other types as an option for datasources Key: SPARK-17967 URL: https://issues.apache.org/jira/browse/SPARK-17967 Project: Spark

[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2016-10-16 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580958#comment-15580958 ] Hyukjin Kwon commented on SPARK-17967: -- I am leaving SPARK-17878 as a related one but it does not

[jira] [Created] (SPARK-17966) Support Spark packages with R code on Mesos

2016-10-16 Thread Sun Rui (JIRA)
Sun Rui created SPARK-17966: --- Summary: Support Spark packages with R code on Mesos Key: SPARK-17966 URL: https://issues.apache.org/jira/browse/SPARK-17966 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-17965) Enable SparkR with Mesos cluster mode

2016-10-16 Thread Sun Rui (JIRA)
Sun Rui created SPARK-17965: --- Summary: Enable SparkR with Mesos cluster mode Key: SPARK-17965 URL: https://issues.apache.org/jira/browse/SPARK-17965 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-17964) Enable SparkR with Mesos client mode

2016-10-16 Thread Sun Rui (JIRA)
Sun Rui created SPARK-17964: --- Summary: Enable SparkR with Mesos client mode Key: SPARK-17964 URL: https://issues.apache.org/jira/browse/SPARK-17964 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-17963: - Description: Currently, it seems function documentation is inconsistent and does not have

[jira] [Updated] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-17963: - Description: Currently, it seems function documentation is inconsistent and does not have

[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580942#comment-15580942 ] Hyukjin Kwon commented on SPARK-17963: -- Hi [~rxin] and [~srowen], I guess the PR would be pretty

[jira] [Updated] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-17963: - Description: Currently, it seems function documentation is inconsistent and does not have

[jira] [Created] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-17963: Summary: Add examples (extend) in each function and improve documentation with arguments Key: SPARK-17963 URL: https://issues.apache.org/jira/browse/SPARK-17963

[jira] [Commented] (SPARK-10915) Add support for UDAFs in Python

2016-10-16 Thread Tobi Bosede (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580917#comment-15580917 ] Tobi Bosede commented on SPARK-10915: - Thoughts [~davies] and [~mgummelt]? Refer to

[jira] [Commented] (SPARK-17898) --repositories needs username and password

2016-10-16 Thread lichenglin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580857#comment-15580857 ] lichenglin commented on SPARK-17898: I have found a way to declaration the username and password:

[jira] [Updated] (SPARK-17962) DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-16 Thread Stephen Hankinson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Hankinson updated SPARK-17962: -- Description: Environment can be reproduced via this git repo using the Deploy to Azure

[jira] [Updated] (SPARK-17962) DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-16 Thread Stephen Hankinson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Hankinson updated SPARK-17962: -- Description: Environment can be reproduced via this git repo using the Deploy to Azure

[jira] [Updated] (SPARK-17962) DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-16 Thread Stephen Hankinson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Hankinson updated SPARK-17962: -- Description: Environment can be reproduced via this git repo using the Deploy to Azure

[jira] [Updated] (SPARK-17962) DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-16 Thread Stephen Hankinson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Hankinson updated SPARK-17962: -- Description: Environment can be reproduced via this git repo using the Deploy to Azure

[jira] [Created] (SPARK-17962) DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-16 Thread Stephen Hankinson (JIRA)
Stephen Hankinson created SPARK-17962: - Summary: DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn Key: SPARK-17962 URL: https://issues.apache.org/jira/browse/SPARK-17962

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580502#comment-15580502 ] Michael Schmeißer commented on SPARK-650: - What if I have a Hadoop InputFormat? Then, certain

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580135#comment-15580135 ] Sean Owen commented on SPARK-650: - But, why do you need to do it before you have an RDD? You can easily

[jira] [Updated] (SPARK-17961) Add storageLevel to Dataset for SparkR

2016-10-16 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-17961: --- Component/s: SQL SparkR > Add storageLevel to Dataset for SparkR >

[jira] [Updated] (SPARK-17961) Add storageLevel to Dataset for SparkR

2016-10-16 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-17961: --- Issue Type: Improvement (was: Bug) > Add storageLevel to Dataset for SparkR >

[jira] [Commented] (SPARK-17961) Add storageLevel to Dataset for SparkR

2016-10-16 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580131#comment-15580131 ] Weichen Xu commented on SPARK-17961: I am working on it and will create PR soon. > Add storageLevel

[jira] [Created] (SPARK-17961) Add storageLevel to Dataset for SparkR

2016-10-16 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17961: -- Summary: Add storageLevel to Dataset for SparkR Key: SPARK-17961 URL: https://issues.apache.org/jira/browse/SPARK-17961 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580069#comment-15580069 ] Michael Schmeißer commented on SPARK-650: - But I'll need to have an RDD to do this, I can't just do

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580062#comment-15580062 ] Sean Owen commented on SPARK-650: - This is still easy to do with mapPartitions, which can call

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580055#comment-15580055 ] Michael Schmeißer commented on SPARK-650: - Ok, let me explain the specific problems that we have

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579949#comment-15579949 ] holdenk commented on SPARK-14141: - Ah sorry for the delay, so doing the cache + count together is done

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579946#comment-15579946 ] holdenk commented on SPARK-13534: - And now they have a release :) I'm not certain its at the stage where

[jira] [Commented] (SPARK-12753) Import error during unit test while calling a function from reduceByKey()

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579942#comment-15579942 ] holdenk commented on SPARK-12753: - (oh as a follow up it appears the user answered their own question on

[jira] [Closed] (SPARK-12753) Import error during unit test while calling a function from reduceByKey()

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-12753. --- Resolution: Not A Problem I don't believe this is a PySpark issue but rather it seems like a Python

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579928#comment-15579928 ] Sean Owen commented on SPARK-650: - Yeah that's a decent use case, because latency is an issue (streaming)

[jira] [Resolved] (SPARK-11223) PySpark CrossValidatorModel does not output metrics for every param in paramGrid

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-11223. - Resolution: Fixed Fixed in SPARK-12810 by [~vectorijk] :) > PySpark CrossValidatorModel does not output

[jira] [Commented] (SPARK-11223) PySpark CrossValidatorModel does not output metrics for every param in paramGrid

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579923#comment-15579923 ] holdenk commented on SPARK-11223: - Oh wait it looks like we've already done this and I was looking at the

[jira] [Commented] (SPARK-11223) PySpark CrossValidatorModel does not output metrics for every param in paramGrid

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579921#comment-15579921 ] holdenk commented on SPARK-11223: - This could be a good starter issue for someone interested in ML or

[jira] [Updated] (SPARK-11223) PySpark CrossValidatorModel does not output metrics for every param in paramGrid

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-11223: Labels: starter (was: ) > PySpark CrossValidatorModel does not output metrics for every param in >

[jira] [Commented] (SPARK-10635) pyspark - running on a different host

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579914#comment-15579914 ] holdenk commented on SPARK-10635: - it would be a bit difficult, although as Py4J is speeding up the

[jira] [Commented] (SPARK-10628) Add support for arbitrary RandomRDD generation to PySparkAPI

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579910#comment-15579910 ] holdenk commented on SPARK-10628: - For someone who is interested in doing this, we might be able to do

[jira] [Closed] (SPARK-10525) Add Python example for VectorSlicer to user guide

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-10525. --- Resolution: Fixed Fix Version/s: 2.0.0 Fixed in SPARK-14514 by [~podongfeng] > Add Python example

[jira] [Commented] (SPARK-10525) Add Python example for VectorSlicer to user guide

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579906#comment-15579906 ] holdenk commented on SPARK-10525: - It looks like it does, I'm going to go ahead and resolve this. Thanks

[jira] [Commented] (SPARK-10319) ALS training using PySpark throws a StackOverflowError

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579901#comment-15579901 ] holdenk commented on SPARK-10319: - Is this issue still occurring for you? > ALS training using PySpark

[jira] [Commented] (SPARK-17960) Upgrade to Py4J 0.10.4

2016-10-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579903#comment-15579903 ] Sean Owen commented on SPARK-17960: --- It seems OK. If the changes aren't essential, maybe we don't have

[jira] [Closed] (SPARK-10223) Add takeOrderedByKey function to extract top N records within each group

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-10223. --- Resolution: Won't Fix I don't see this feature being particularly popular, especially since its relatively

[jira] [Closed] (SPARK-9965) Scala, Python SQLContext input methods' deprecation statuses do not match

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-9965. -- Resolution: Resolved Fix Version/s: 2.0.0 These methods were removed in

[jira] [Commented] (SPARK-9931) Flaky test: mllib/tests.py StreamingLogisticRegressionWithSGDTests. test_training_and_prediction

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579889#comment-15579889 ] holdenk commented on SPARK-9931: Is this still a test people are finding flakey or did [~josephkb]'s fix

[jira] [Commented] (SPARK-7653) ML Pipeline and meta-algs should take random seed param

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579884#comment-15579884 ] holdenk commented on SPARK-7653: I think the simplest workaround would be exposing HasSeed to the public.

[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579878#comment-15579878 ] holdenk commented on SPARK-7941: So if its ok - since I don't see other reports of this - unless this is

[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579874#comment-15579874 ] holdenk commented on SPARK-7721: [~joshrosen]is this something your still looking at/interested in or

[jira] [Commented] (SPARK-3981) Consider a better approach to initialize SerDe on executors

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579868#comment-15579868 ] holdenk commented on SPARK-3981: That's a good question. It seems like much of the code that was copied in

[jira] [Commented] (SPARK-2868) Support named accumulators in Python

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579866#comment-15579866 ] holdenk commented on SPARK-2868: ping [~davies] - would you be available to review if I got this switched

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579847#comment-15579847 ] holdenk commented on SPARK-9487: Great, thanks for taking this issue on :) > Use the same num. worker

[jira] [Created] (SPARK-17960) Upgrade to Py4J 0.10.4

2016-10-16 Thread holdenk (JIRA)
holdenk created SPARK-17960: --- Summary: Upgrade to Py4J 0.10.4 Key: SPARK-17960 URL: https://issues.apache.org/jira/browse/SPARK-17960 Project: Spark Issue Type: Improvement Components:

[jira] [Updated] (SPARK-17959) spark.sql.join.preferSortMergeJoin has no effect for simple join due to calculated size of LogicalRdd

2016-10-16 Thread Stavros Kontopoulos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-17959: Summary: spark.sql.join.preferSortMergeJoin has no effect for simple join due to

[jira] [Updated] (SPARK-17959) spark.sql.join.preferSortMergeJoin has no effect for simple join

2016-10-16 Thread Stavros Kontopoulos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-17959: Description: Example code: val df = spark.sparkContext.parallelize(List(("A",

[jira] [Updated] (SPARK-17959) spark.sql.join.preferSortMergeJoin has no effect for simple join

2016-10-16 Thread Stavros Kontopoulos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-17959: Description: Example code: val df = spark.sparkContext.parallelize(List(("A",

[jira] [Updated] (SPARK-17959) spark.sql.join.preferSortMergeJoin has no effect for simple join

2016-10-16 Thread Stavros Kontopoulos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-17959: Description: Example code: val df = spark.sparkContext.parallelize(List(("A",

[jira] [Created] (SPARK-17959) spark.sql.join.preferSortMergeJoin has no effect for simple join

2016-10-16 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created SPARK-17959: --- Summary: spark.sql.join.preferSortMergeJoin has no effect for simple join Key: SPARK-17959 URL: https://issues.apache.org/jira/browse/SPARK-17959

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Olivier Armand (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579737#comment-15579737 ] Olivier Armand commented on SPARK-650: -- Data doesn't arrives necessarily immediately, but we need to

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579720#comment-15579720 ] Sean Owen commented on SPARK-650: - It would work in this case to immediately schedule initialization on the

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Olivier Armand (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579710#comment-15579710 ] Olivier Armand commented on SPARK-650: -- > "just run a dummy mapPartitions at the outset on the same

[jira] [Resolved] (SPARK-17958) Why I ran into issue " accumulator, copyandreset must be zero error"

2016-10-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17958. --- Resolution: Invalid Pleas read

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579630#comment-15579630 ] Sean Owen commented on SPARK-650: - Reopening doesn't do anything by itself, or cause anyone to consider

[jira] [Commented] (SPARK-17931) taskScheduler has some unneeded serialization

2016-10-16 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579590#comment-15579590 ] Apache Spark commented on SPARK-17931: -- User 'witgo' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17931) taskScheduler has some unneeded serialization

2016-10-16 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17931: Assignee: (was: Apache Spark) > taskScheduler has some unneeded serialization >

[jira] [Assigned] (SPARK-17931) taskScheduler has some unneeded serialization

2016-10-16 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17931: Assignee: Apache Spark > taskScheduler has some unneeded serialization >