[jira] [Resolved] (SPARK-2907) Use mutable.HashMap to represent Model in Word2Vec

2014-09-02 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2907. -- Resolution: Fixed > Use mutable.HashMap to represent Model in Word2Vec > ---

[jira] [Created] (SPARK-3366) Compute best splits distributively in decision tree

2014-09-02 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-3366: Summary: Compute best splits distributively in decision tree Key: SPARK-3366 URL: https://issues.apache.org/jira/browse/SPARK-3366 Project: Spark Issue Type:

[jira] [Created] (SPARK-3365) Failure to save Lists to Parquet

2014-09-02 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-3365: --- Summary: Failure to save Lists to Parquet Key: SPARK-3365 URL: https://issues.apache.org/jira/browse/SPARK-3365 Project: Spark Issue Type: Bug Affe

[jira] [Commented] (SPARK-3219) K-Means clusterer should support Bregman distance functions

2014-09-02 Thread Derrick Burns (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119422#comment-14119422 ] Derrick Burns commented on SPARK-3219: -- The current implementation supports one concr

[jira] [Closed] (SPARK-3344) Reformat code: add blank lines

2014-09-02 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma closed SPARK-3344. -- Resolution: Invalid > Reformat code: add blank lines > -- > >

[jira] [Updated] (SPARK-3195) Can you add some statistics to do logistic regression better in mllib?

2014-09-02 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3195: - Affects Version/s: (was: 1.3.0) > Can you add some statistics to do logistic regression better

[jira] [Updated] (SPARK-3364) Zip equal-length but unequally-partition

2014-09-02 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-3364: --- Affects Version/s: 1.0.2 > Zip equal-length but unequally-partition >

[jira] [Commented] (SPARK-3364) Zip equal-length but unequally-partition

2014-09-02 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119414#comment-14119414 ] Guoqiang Li commented on SPARK-3364: This bug has been fixed in 1.1.0 . > Zip equal-l

[jira] [Commented] (SPARK-3219) K-Means clusterer should support Bregman distance functions

2014-09-02 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119412#comment-14119412 ] Xiangrui Meng commented on SPARK-3219: -- Several distance measures were added to Breez

[jira] [Commented] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119407#comment-14119407 ] Michael Armbrust commented on SPARK-3298: - Yeah I guess my thoughts when implement

[jira] [Commented] (SPARK-3336) [Spark SQL] In pyspark, cannot group by field on UDF

2014-09-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119404#comment-14119404 ] Davies Liu commented on SPARK-3336: --- Maybe it's a feature (see the group by syntax [1]),

[jira] [Commented] (SPARK-3336) [Spark SQL] In pyspark, cannot group by field on UDF

2014-09-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119400#comment-14119400 ] Davies Liu commented on SPARK-3336: --- [~marmbrus], If we reverse the order of count() and

[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

2014-09-02 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119385#comment-14119385 ] Mridul Muralidharan commented on SPARK-1476: WIP version pushed to https://git

[jira] [Commented] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-09-02 Thread Evan Chan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119373#comment-14119373 ] Evan Chan commented on SPARK-3298: -- I can't really think of a good way to prevent people

[jira] [Commented] (SPARK-3019) Pluggable block transfer (data plane communication) interface

2014-09-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119367#comment-14119367 ] Reynold Xin commented on SPARK-3019: Sounds good to me. As I said in the design propos

[jira] [Commented] (SPARK-3019) Pluggable block transfer (data plane communication) interface

2014-09-02 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119365#comment-14119365 ] Mridul Muralidharan commented on SPARK-3019: I will try to push the version we

[jira] [Commented] (SPARK-3019) Pluggable block transfer (data plane communication) interface

2014-09-02 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119362#comment-14119362 ] Mridul Muralidharan commented on SPARK-3019: Just went over the proposal in so

[jira] [Commented] (SPARK-3363) [SQL] Type Coercion should support every type to have null value

2014-09-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119361#comment-14119361 ] Apache Spark commented on SPARK-3363: - User 'adrian-wang' has created a pull request f

[jira] [Updated] (SPARK-3364) Zip equal-length but unequally-partition

2014-09-02 Thread Kevin Jung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated SPARK-3364: -- Affects Version/s: (was: 1.0.2) > Zip equal-length but unequally-partition > ---

[jira] [Updated] (SPARK-3364) Zip equal-length but unequally-partition

2014-09-02 Thread Kevin Jung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated SPARK-3364: -- Component/s: Spark Core Affects Version/s: 1.0.2 > Zip equal-length but unequally-partition >

[jira] [Updated] (SPARK-3364) Zip equal-length but unequally-partition

2014-09-02 Thread Kevin Jung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated SPARK-3364: -- Summary: Zip equal-length but unequally-partition (was: zip equal-length but unequally-partition) > Zi

[jira] [Updated] (SPARK-3364) zip equal-length but unequally-partition

2014-09-02 Thread Kevin Jung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated SPARK-3364: -- Priority: Major (was: Critical) > zip equal-length but unequally-partition > --

[jira] [Updated] (SPARK-3364) zip equal-length but unequally-partition

2014-09-02 Thread Kevin Jung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated SPARK-3364: -- Description: ZippedRDD losts some elements after zipping RDDs with equal numbers of partitions but uneq

[jira] [Updated] (SPARK-3364) zip equal-length but unequally-partition

2014-09-02 Thread Kevin Jung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated SPARK-3364: -- Description: ZippedRDD losts some elements after zipping RDDs with equal numbers of partitions but uneq

[jira] [Created] (SPARK-3364) zip equal-length but unequally-partition

2014-09-02 Thread Kevin Jung (JIRA)
Kevin Jung created SPARK-3364: - Summary: zip equal-length but unequally-partition Key: SPARK-3364 URL: https://issues.apache.org/jira/browse/SPARK-3364 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-2484) Build should not run hive compatibility tests by default.

2014-09-02 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li resolved SPARK-2484. Resolution: Fixed Fix Version/s: 1.1.0 Target Version/s: 1.1.0 > Build should not r

[jira] [Created] (SPARK-3363) [SQL] Type Coercion should support every type to have null value

2014-09-02 Thread Adrian Wang (JIRA)
Adrian Wang created SPARK-3363: -- Summary: [SQL] Type Coercion should support every type to have null value Key: SPARK-3363 URL: https://issues.apache.org/jira/browse/SPARK-3363 Project: Spark I

[jira] [Updated] (SPARK-3362) [SQL] bug in CaseWhen resolve

2014-09-02 Thread Adrian Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated SPARK-3362: --- Component/s: SQL > [SQL] bug in CaseWhen resolve > - > > K

[jira] [Commented] (SPARK-3362) [SQL] bug in CaseWhen resolve

2014-09-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119303#comment-14119303 ] Apache Spark commented on SPARK-3362: - User 'adrian-wang' has created a pull request f

[jira] [Resolved] (SPARK-3300) No need to call clear() in ensureFreeSpace and shorten build() in ColumnBuilder

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3300. - Resolution: Fixed Fix Version/s: 1.2.0 > No need to call clear() in ensureFreeSpace

[jira] [Commented] (SPARK-3358) PySpark worker fork()ing performance regression in m3.* / PVM instances

2014-09-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119276#comment-14119276 ] Nicholas Chammas commented on SPARK-3358: - Nit: Isn't it [PV and not PVM|http://d

[jira] [Commented] (SPARK-3341) The dataType of Sqrt expression should be DoubleType.

2014-09-02 Thread Takuya Ueshin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119273#comment-14119273 ] Takuya Ueshin commented on SPARK-3341: -- Hi, this issue's Fix Version should be the sa

[jira] [Commented] (SPARK-3358) PySpark worker fork()ing performance regression in m3.* / PVM instances

2014-09-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119265#comment-14119265 ] Apache Spark commented on SPARK-3358: - User 'pwendell' has created a pull request for

[jira] [Commented] (SPARK-3358) PySpark worker fork()ing performance regression in m3.* / PVM instances

2014-09-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119253#comment-14119253 ] Patrick Wendell commented on SPARK-3358: There was actually a patch a couple month

[jira] [Resolved] (SPARK-3341) The dataType of Sqrt expression should be DoubleType.

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3341. - Resolution: Fixed Fix Version/s: 1.2.0 Assignee: Takuya Ueshin > The dataT

[jira] [Commented] (SPARK-2627) Check for PEP 8 compliance on all Python code in the Jenkins CI cycle

2014-09-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119239#comment-14119239 ] Nicholas Chammas commented on SPARK-2627: - Okie doke: [SPARK-3361] > Check for PE

[jira] [Created] (SPARK-3362) [SQL] bug in CaseWhen resolve

2014-09-02 Thread Adrian Wang (JIRA)
Adrian Wang created SPARK-3362: -- Summary: [SQL] bug in CaseWhen resolve Key: SPARK-3362 URL: https://issues.apache.org/jira/browse/SPARK-3362 Project: Spark Issue Type: Bug Reporter:

[jira] [Created] (SPARK-3361) Expand PEP 8 checks to include EC2 script and Python examples

2014-09-02 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-3361: --- Summary: Expand PEP 8 checks to include EC2 script and Python examples Key: SPARK-3361 URL: https://issues.apache.org/jira/browse/SPARK-3361 Project: Spark

[jira] [Commented] (SPARK-2627) Check for PEP 8 compliance on all Python code in the Jenkins CI cycle

2014-09-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119227#comment-14119227 ] Reynold Xin commented on SPARK-2627: We could go either way, but maybe open a new tick

[jira] [Commented] (SPARK-2627) Check for PEP 8 compliance on all Python code in the Jenkins CI cycle

2014-09-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119226#comment-14119226 ] Nicholas Chammas commented on SPARK-2627: - Note: We should cover the EC2 script an

[jira] [Resolved] (SPARK-2823) GraphX jobs throw IllegalArgumentException

2014-09-02 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave resolved SPARK-2823. --- Resolution: Fixed Fix Version/s: 1.0.3 1.1.1 1.2.0 Issue

[jira] [Created] (SPARK-3360) Add RowMatrix.multiply(Vector)

2014-09-02 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3360: - Summary: Add RowMatrix.multiply(Vector) Key: SPARK-3360 URL: https://issues.apache.org/jira/browse/SPARK-3360 Project: Spark Issue Type: Improvement Comp

[jira] [Commented] (SPARK-3343) Support for CREATE TABLE AS SELECT that specifies the format

2014-09-02 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119190#comment-14119190 ] Cheng Hao commented on SPARK-3343: -- And probably also depends on https://github.com/apach

[jira] [Commented] (SPARK-3343) Support for CREATE TABLE AS SELECT that specifies the format

2014-09-02 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119187#comment-14119187 ] Cheng Hao commented on SPARK-3343: -- Actually I was planning to do in after https://githu

[jira] [Commented] (SPARK-3335) [Spark SQL] In pyspark, cannot use broadcast variables in UDF

2014-09-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119186#comment-14119186 ] Apache Spark commented on SPARK-3335: - User 'davies' has created a pull request for th

[jira] [Closed] (SPARK-2981) PartitionStrategy: VertexID hash overflow

2014-09-02 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave closed SPARK-2981. - Resolution: Fixed Fix Version/s: 1.0.3 1.2.0 1.1.1 > Partitio

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-09-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119177#comment-14119177 ] Nicholas Chammas commented on SPARK-: - So I've repeated the tests with the exa

[jira] [Closed] (SPARK-3123) override the "setName" function to set EdgeRDD's name manually just as VertexRDD does.

2014-09-02 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave closed SPARK-3123. - Resolution: Fixed Fix Version/s: 1.2.0 > override the "setName" function to set EdgeRDD's name manu

[jira] [Closed] (SPARK-1986) lib.Analytics should be in org.apache.spark.examples

2014-09-02 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave closed SPARK-1986. - Resolution: Fixed Fix Version/s: 1.2.0 > lib.Analytics should be in org.apache.spark.examples > ---

[jira] [Commented] (SPARK-3358) PySpark worker fork()ing performance regression in m3.* / PVM instances

2014-09-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119145#comment-14119145 ] Josh Rosen commented on SPARK-3358: --- Yes, I meant to link the two issues. > PySpark wor

[jira] [Commented] (SPARK-3358) PySpark worker fork()ing performance regression in m3.* / PVM instances

2014-09-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119141#comment-14119141 ] Nicholas Chammas commented on SPARK-3358: - Josh, do you think this is related to t

[jira] [Commented] (SPARK-3146) Improve the flexibility of Spark Streaming Kafka API to offer user the ability to process message before storing into BM

2014-09-02 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119128#comment-14119128 ] Saisai Shao commented on SPARK-3146: Hi [~tdas], Sorry for late response, thanks a lo

[jira] [Updated] (SPARK-3333) Large number of partitions causes OOM

2014-09-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-: Attachment: nick-1.0.2.driver.log.zip nick-1.1.0-rc3.driver.log.zip Here are

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-09-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119119#comment-14119119 ] Nicholas Chammas commented on SPARK-: - Just to double check my results, I re-r

[jira] [Commented] (SPARK-2219) AddJar doesn't work

2014-09-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119109#comment-14119109 ] Apache Spark commented on SPARK-2219: - User 'liancheng' has created a pull request for

[jira] [Created] (SPARK-3359) `sbt/sbt unidoc` doesn't work with Java 8

2014-09-02 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-3359: Summary: `sbt/sbt unidoc` doesn't work with Java 8 Key: SPARK-3359 URL: https://issues.apache.org/jira/browse/SPARK-3359 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-3358) PySpark worker fork()ing performance regression in m3.* / PVM instances

2014-09-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119093#comment-14119093 ] Josh Rosen commented on SPARK-3358: --- Update: that same microbenchmark that I posted abov

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119091#comment-14119091 ] Sandy Ryza commented on SPARK-2978: --- Ah ok, sounds good. > Provide an MR-style shuffle

[jira] [Commented] (SPARK-3358) PySpark worker fork()ing performance regression in m3.* / PVM instances

2014-09-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119089#comment-14119089 ] Josh Rosen commented on SPARK-3358: --- Credit where it's due: Davies pointed out the poten

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119084#comment-14119084 ] Reynold Xin commented on SPARK-2978: It was just asked multiple times by various users

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119080#comment-14119080 ] Sandy Ryza commented on SPARK-2978: --- What's the thinking behind adding sortWithinPartiti

[jira] [Commented] (SPARK-2706) Enable Spark to support Hive 0.13

2014-09-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119079#comment-14119079 ] Apache Spark commented on SPARK-2706: - User 'zhzhan' has created a pull request for th

[jira] [Updated] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2978: --- Target Version/s: 1.2.0 > Provide an MR-style shuffle transformation > ---

[jira] [Updated] (SPARK-3328) ./make-distribution.sh --with-tachyon build is broken

2014-09-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3328: --- Assignee: prudhvi krishna > ./make-distribution.sh --with-tachyon build is broken > --

[jira] [Resolved] (SPARK-3328) ./make-distribution.sh --with-tachyon build is broken

2014-09-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3328. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 2228 [https://g

[jira] [Created] (SPARK-3358) PySpark worker fork()ing performance regression in m3.* / PVM instances

2014-09-02 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-3358: - Summary: PySpark worker fork()ing performance regression in m3.* / PVM instances Key: SPARK-3358 URL: https://issues.apache.org/jira/browse/SPARK-3358 Project: Spark

[jira] [Updated] (SPARK-3322) ConnectionManager logs an error when the application ends

2014-09-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3322: --- Summary: ConnectionManager logs an error when the application ends (was: Log a ConnectionMana

[jira] [Resolved] (SPARK-3346) Error happened in using memcached

2014-09-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3346. Resolution: Invalid Would you mind reporting this to the user list? We use JIRA only for iss

[jira] [Commented] (SPARK-3350) Strange anomaly trying to write a SchemaRDD into an Avro file

2014-09-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119057#comment-14119057 ] Patrick Wendell commented on SPARK-3350: Can you provide the stacktrace? > Strang

[jira] [Updated] (SPARK-3350) Strange anomaly trying to write a SchemaRDD into an Avro file

2014-09-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3350: --- Component/s: (was: Input/Output) SQL > Strange anomaly trying to write a

[jira] [Updated] (SPARK-3357) Internal log messages should be set at DEBUG level instead of INFO

2014-09-02 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3357: - Description: spark-shell shows INFO by default, so we should carefully choose what to show at INF

[jira] [Created] (SPARK-3357) Internal log messages should be set at DEBUG level instead of INFO

2014-09-02 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-3357: Summary: Internal log messages should be set at DEBUG level instead of INFO Key: SPARK-3357 URL: https://issues.apache.org/jira/browse/SPARK-3357 Project: Spark

[jira] [Commented] (SPARK-2706) Enable Spark to support Hive 0.13

2014-09-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118979#comment-14118979 ] Zhan Zhang commented on SPARK-2706: --- send out pull request https://github.com/apache/spa

[jira] [Commented] (SPARK-3215) Add remote interface for SparkContext

2014-09-02 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118960#comment-14118960 ] Marcelo Vanzin commented on SPARK-3215: --- For those who'd prefer to see some code, he

[jira] [Resolved] (SPARK-3098) In some cases, operation zipWithIndex get a wrong results

2014-09-02 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3098. -- Resolution: Won't Fix > In some cases, operation zipWithIndex get a wrong results > ---

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118921#comment-14118921 ] Michael Armbrust commented on SPARK-2883: - Can you elaborate on what you mean here

[jira] [Created] (SPARK-3356) Document when RDD elements' ordering within partitions is nondeterministic

2014-09-02 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3356: Summary: Document when RDD elements' ordering within partitions is nondeterministic Key: SPARK-3356 URL: https://issues.apache.org/jira/browse/SPARK-3356 Project: Spa

[jira] [Commented] (SPARK-3098) In some cases, operation zipWithIndex get a wrong results

2014-09-02 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118918#comment-14118918 ] Matei Zaharia commented on SPARK-3098: -- Created SPARK-3356 to track this. > In some

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118917#comment-14118917 ] Reynold Xin commented on SPARK-2978: I talked to [~pwendell] about this. How about thi

[jira] [Commented] (SPARK-3098) In some cases, operation zipWithIndex get a wrong results

2014-09-02 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118913#comment-14118913 ] Matei Zaharia commented on SPARK-3098: -- Yup, let's maybe document this for now. I'll

[jira] [Updated] (SPARK-2883) Spark Support for ORCFile format

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2883: Target Version/s: (was: 1.2.0) > Spark Support for ORCFile format > --

[jira] [Updated] (SPARK-2883) Spark Support for ORCFile format

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2883: Target Version/s: 1.2.0 > Spark Support for ORCFile format > ---

[jira] [Updated] (SPARK-2917) Avoid CTAS creates table in logical plan analyzing.

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2917: Target Version/s: 1.2.0 > Avoid CTAS creates table in logical plan analyzing. >

[jira] [Commented] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118886#comment-14118886 ] Michael Armbrust commented on SPARK-3298: - Throwing an error here would be a break

[jira] [Resolved] (SPARK-3109) Sql query with OR condition should be handled above PhysicalOperation layer

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3109. - Resolution: Won't Fix Hey Alex, I'm going to close this as I'm not sure its actually possi

[jira] [Updated] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3329: Target Version/s: 1.2.0 > HiveQuerySuite SET tests depend on map orderings > ---

[jira] [Updated] (SPARK-3335) [Spark SQL] In pyspark, cannot use broadcast variables in UDF

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3335: Assignee: Davies Liu > [Spark SQL] In pyspark, cannot use broadcast variables in UDF >

[jira] [Updated] (SPARK-3335) [Spark SQL] In pyspark, cannot use broadcast variables in UDF

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3335: Target Version/s: 1.2.0 > [Spark SQL] In pyspark, cannot use broadcast variables in UDF > -

[jira] [Updated] (SPARK-3336) [Spark SQL] In pyspark, cannot group by field on UDF

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3336: Assignee: Davies Liu > [Spark SQL] In pyspark, cannot group by field on UDF > --

[jira] [Updated] (SPARK-3336) [Spark SQL] In pyspark, cannot group by field on UDF

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3336: Target Version/s: 1.2.0 > [Spark SQL] In pyspark, cannot group by field on UDF > ---

[jira] [Commented] (SPARK-3019) Pluggable block transfer (data plane communication) interface

2014-09-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118849#comment-14118849 ] Apache Spark commented on SPARK-3019: - User 'rxin' has created a pull request for this

[jira] [Updated] (SPARK-3343) Support for CREATE TABLE AS SELECT that specifies the format

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3343: Target Version/s: 1.2.0 Affects Version/s: (was: 1.0.2) Issue Type: New F

[jira] [Updated] (SPARK-3343) Support for CREATE TABLE AS SELECT that specifies the format

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3343: Summary: Support for CREATE TABLE AS SELECT that specifies the format (was: Unsupported lan

[jira] [Resolved] (SPARK-3354) Add LENGTH and DATALENGTH functions to Spark SQL

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3354. - Resolution: Duplicate Closing this as a duplicate of [SPARK-2686]. Can you make your comm

[jira] [Updated] (SPARK-2686) Add Length support to Spark SQL and HQL and Strlen support to SQL

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2686: Affects Version/s: (was: 1.1.1) (was: 0.9.2)

[jira] [Updated] (SPARK-2686) Add Length support to Spark SQL and HQL and Strlen support to SQL

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2686: Fix Version/s: (was: 1.1.1) > Add Length support to Spark SQL and HQL and Strlen support

[jira] [Updated] (SPARK-2686) Add Length support to Spark SQL and HQL and Strlen support to SQL

2014-09-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2686: Target Version/s: 1.2.0 (was: 1.1.1) > Add Length support to Spark SQL and HQL and Strlen s

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-09-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118826#comment-14118826 ] Josh Rosen commented on SPARK-: --- 360/220 is approximately 1.6. Using Splunk, I comp

[jira] [Resolved] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite

2014-09-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3330. -- Resolution: Won't Fix It will be more suitable to change jenkins to run "mvn clean && mvn ... package"

[jira] [Commented] (SPARK-3176) Implement 'POWER', 'ABS and 'LAST' for sql

2014-09-02 Thread Xinyun Huang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118822#comment-14118822 ] Xinyun Huang commented on SPARK-3176: - For now, the POWER's function will implicitly c

  1   2   >