[jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted

2014-06-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045626#comment-14045626 ] Patrick Wendell commented on SPARK-2228: [~rxin] unfortunately I think it's more c

[jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted

2014-06-26 Thread Pei-Lun Lee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045621#comment-14045621 ] Pei-Lun Lee commented on SPARK-2228: We have the same problem in a long running applic

[jira] [Commented] (SPARK-2292) NullPointerException in JavaPairRDD.reduceByKey

2014-06-26 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045576#comment-14045576 ] Andrew Ash commented on SPARK-2292: --- [~mkim] reported an issue with a very similar stack

[jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045572#comment-14045572 ] Reynold Xin commented on SPARK-2228: What if we just creates a default stage entry whe

[jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted

2014-06-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045564#comment-14045564 ] Patrick Wendell commented on SPARK-2228: I ran your reproduction locally. What I f

[jira] [Comment Edited] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-06-26 Thread Piotr Szul (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045558#comment-14045558 ] Piotr Szul edited comment on SPARK-2138 at 6/27/14 5:42 AM: I

[jira] [Comment Edited] (SPARK-2085) Apply user-specific regularization instead of uniform regularization in Alternating Least Squares (ALS)

2014-06-26 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045554#comment-14045554 ] Guoqiang Li edited comment on SPARK-2085 at 6/27/14 5:40 AM: -

[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-06-26 Thread Piotr Szul (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045558#comment-14045558 ] Piotr Szul commented on SPARK-2138: --- I ran into similar problem when running KMean with

[jira] [Commented] (SPARK-2085) Apply user-specific regularization instead of uniform regularization in Alternating Least Squares (ALS)

2014-06-26 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045554#comment-14045554 ] Guoqiang Li commented on SPARK-2085: [~mengxr] This PR should be merged into version

[jira] [Commented] (SPARK-2104) RangePartitioner should use user specified serializer to serialize range bounds

2014-06-26 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045524#comment-14045524 ] Saisai Shao commented on SPARK-2104: OK, got it. Thanks a lot > RangePartitioner shou

[jira] [Created] (SPARK-2304) Add example application to run tera sort

2014-06-26 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-2304: -- Summary: Add example application to run tera sort Key: SPARK-2304 URL: https://issues.apache.org/jira/browse/SPARK-2304 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-2104) RangePartitioner should use user specified serializer to serialize range bounds

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2104: --- Assignee: Saisai Shao > RangePartitioner should use user specified serializer to serialize range > b

[jira] [Resolved] (SPARK-2181) The keys for sorting the columns of Executor page in SparkUI are incorrect

2014-06-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2181. Resolution: Fixed Fix Version/s: 1.0.2 1.1.0 > The keys for sorti

[jira] [Commented] (SPARK-2104) RangePartitioner should use user specified serializer to serialize range bounds

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045520#comment-14045520 ] Reynold Xin commented on SPARK-2104: I don't really remember ... :) It's been a while

[jira] [Commented] (SPARK-2104) RangePartitioner should use user specified serializer to serialize range bounds

2014-06-26 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045519#comment-14045519 ] Saisai Shao commented on SPARK-2104: Hi Reynold, thanks a lot for your code. At first

[jira] [Commented] (SPARK-2279) JavaSparkContext should allow creation of EmptyRDD

2014-06-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045517#comment-14045517 ] Patrick Wendell commented on SPARK-2279: Ah I see - I thought you meant the EmptyR

[jira] [Resolved] (SPARK-2251) MLLib Naive Bayes Example SparkException: Can only zip RDDs with same number of elements in each partition

2014-06-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2251. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1229 [https://

[jira] [Commented] (SPARK-2126) Move MapOutputTracker behind ShuffleManager interface

2014-06-26 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045514#comment-14045514 ] Nan Zhu commented on SPARK-2126: PR: https://github.com/apache/spark/pull/1240 > Move Map

[jira] [Commented] (SPARK-2292) NullPointerException in JavaPairRDD.reduceByKey

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045509#comment-14045509 ] Reynold Xin commented on SPARK-2292: BTW based on the stack trace, it looks like a Pai

[jira] [Commented] (SPARK-2292) NullPointerException in JavaPairRDD.reduceByKey

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045508#comment-14045508 ] Reynold Xin commented on SPARK-2292: I can't reproduced this one with the following co

[jira] [Updated] (SPARK-2292) NullPointerException in JavaPairRDD.reduceByKey

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2292: --- Description: Invoking JavaPairRDD.reduceByKey results in an NPE: {code} 14/06/26 21:05:35 WARN sched

[jira] [Commented] (SPARK-2104) RangePartitioner should use user specified serializer to serialize range bounds

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045504#comment-14045504 ] Reynold Xin commented on SPARK-2104: BTW I have some old code I wrote -- you can do yo

[jira] [Commented] (SPARK-2233) make-distribution script should list the git hash in the RELEASE file

2014-06-26 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045499#comment-14045499 ] Matthew Farrellee commented on SPARK-2233: -- patch lgtm > make-distribution scrip

[jira] [Updated] (SPARK-2288) Hide ShuffleBlockManager behind ShuffleManager

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2288: --- Assignee: Raymond Liu > Hide ShuffleBlockManager behind ShuffleManager >

[jira] [Updated] (SPARK-2303) Poisson regression model for count data

2014-06-26 Thread Gang Bai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Bai updated SPARK-2303: Description: Modeling count data is of great importance in solving real-world statistic problems. Currentl

[jira] [Created] (SPARK-2303) Poisson regression model for count data

2014-06-26 Thread Gang Bai (JIRA)
Gang Bai created SPARK-2303: --- Summary: Poisson regression model for count data Key: SPARK-2303 URL: https://issues.apache.org/jira/browse/SPARK-2303 Project: Spark Issue Type: Bug Compone

[jira] [Commented] (SPARK-2251) MLLib Naive Bayes Example SparkException: Can only zip RDDs with same number of elements in each partition

2014-06-26 Thread Jun Xie (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045462#comment-14045462 ] Jun Xie commented on SPARK-2251: I use the following command: git log The first entry is

[jira] [Created] (SPARK-2302) master should discard exceeded completedDrivers

2014-06-26 Thread Lianhui Wang (JIRA)
Lianhui Wang created SPARK-2302: --- Summary: master should discard exceeded completedDrivers Key: SPARK-2302 URL: https://issues.apache.org/jira/browse/SPARK-2302 Project: Spark Issue Type: Impr

[jira] [Created] (SPARK-2301) add ability to submit multiple jars for Driver

2014-06-26 Thread Lianhui Wang (JIRA)
Lianhui Wang created SPARK-2301: --- Summary: add ability to submit multiple jars for Driver Key: SPARK-2301 URL: https://issues.apache.org/jira/browse/SPARK-2301 Project: Spark Issue Type: Improv

[jira] [Commented] (SPARK-2044) Pluggable interface for shuffles

2014-06-26 Thread Raymond Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045447#comment-14045447 ] Raymond Liu commented on SPARK-2044: Hi [~matei], also the pull request for above jira

[jira] [Commented] (SPARK-2288) Hide ShuffleBlockManager behind ShuffleManager

2014-06-26 Thread Raymond Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045446#comment-14045446 ] Raymond Liu commented on SPARK-2288: Hi pull request at https://github.com/apache/sp

[jira] [Updated] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2244: --- Fix Version/s: (was: 1.0.1) > pyspark - RDD action hangs (after previously succeeding) >

[jira] [Updated] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2244: --- Fix Version/s: 1.1.0 1.0.1 > pyspark - RDD action hangs (after previously succeedi

[jira] [Resolved] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-2244. Resolution: Duplicate Assignee: Andrew Or > pyspark - RDD action hangs (after previously succ

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-26 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045437#comment-14045437 ] Matthew Farrellee commented on SPARK-2244: -- this is a duplicate of and is resolve

[jira] [Commented] (SPARK-2294) TaskSchedulerImpl and TaskSetManager do not properly prioritize which tasks get assigned to an executor

2014-06-26 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045433#comment-14045433 ] Mridul Muralidharan commented on SPARK-2294: I agree; We should bump no locali

[jira] [Commented] (SPARK-2104) RangePartitioner should use user specified serializer to serialize range bounds

2014-06-26 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045430#comment-14045430 ] Saisai Shao commented on SPARK-2104: Ok, got it. I will try to fix this issue :) > Ra

[jira] [Resolved] (SPARK-697) RDD should be covariant in T

2014-06-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-697. -- Resolution: Duplicate > RDD should be covariant in T > > >

[jira] [Commented] (SPARK-2104) RangePartitioner should use user specified serializer to serialize range bounds

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045419#comment-14045419 ] Reynold Xin commented on SPARK-2104: That sounds good! > RangePartitioner should use

[jira] [Commented] (SPARK-2104) RangePartitioner should use user specified serializer to serialize range bounds

2014-06-26 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045413#comment-14045413 ] Saisai Shao commented on SPARK-2104: Hi Reynold, is that annoying message you mentione

[jira] [Updated] (SPARK-2300) PySpark shell hides stderr output

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2300: --- Assignee: Andrew Or > PySpark shell hides stderr output > - > >

[jira] [Resolved] (SPARK-2300) PySpark shell hides stderr output

2014-06-26 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-2300. -- Resolution: Fixed > PySpark shell hides stderr output > - > >

[jira] [Created] (SPARK-2300) PySpark shell hides stderr output

2014-06-26 Thread Andrew Or (JIRA)
Andrew Or created SPARK-2300: Summary: PySpark shell hides stderr output Key: SPARK-2300 URL: https://issues.apache.org/jira/browse/SPARK-2300 Project: Spark Issue Type: Bug Components:

[jira] [Commented] (SPARK-2300) PySpark shell hides stderr output

2014-06-26 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045342#comment-14045342 ] Andrew Or commented on SPARK-2300: -- https://github.com/apache/spark/pull/1178 > PySpark

[jira] [Resolved] (SPARK-2276) user should be able to provide schema for table creation

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2276. - Resolution: Duplicate Thanks for reporting this. This is definitely on our roadmap for t

[jira] [Updated] (SPARK-2212) Hash Outer Joins

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2212: Target Version/s: 1.1.0 > Hash Outer Joins > > > Key: SPAR

[jira] [Updated] (SPARK-2063) Creating a SchemaRDD via sql() does not correctly resolve nested types

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2063: Target Version/s: 1.1.0 > Creating a SchemaRDD via sql() does not correctly resolve nested

[jira] [Updated] (SPARK-2066) Better error message for non-aggregated attributes with aggregates

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2066: Summary: Better error message for non-aggregated attributes with aggregates (was: Better E

[jira] [Updated] (SPARK-2059) Unresolved Attributes should cause a failure before execution time

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2059: Target Version/s: 1.1.0 > Unresolved Attributes should cause a failure before execution tim

[jira] [Resolved] (SPARK-1982) saveToParquetFile doesn't support ByteType

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1982. - Resolution: Fixed > saveToParquetFile doesn't support ByteType >

[jira] [Resolved] (SPARK-1513) Specialized ColumnType for Timestamp

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1513. - Resolution: Duplicate > Specialized ColumnType for Timestamp > --

[jira] [Resolved] (SPARK-2112) ParquetTypesConverter should not create its own conf

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2112. - Resolution: Fixed Assignee: Andre Schumacher > ParquetTypesConverter should not cre

[jira] [Updated] (SPARK-2119) Reading Parquet InputSplits dominates query execution time when reading off S3

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2119: Target Version/s: 1.1.0 > Reading Parquet InputSplits dominates query execution time when r

[jira] [Resolved] (SPARK-2286) Report exception/errors for failed tasks that are not ExceptionFailure

2014-06-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2286. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 https://github.com/a

[jira] [Updated] (SPARK-2227) Support "dfs" command

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2227: Target Version/s: 1.1.0 > Support "dfs" command > - > >

[jira] [Updated] (SPARK-2220) Fix remaining Hive Commands

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2220: Target Version/s: 1.1.0 > Fix remaining Hive Commands > --- > >

[jira] [Updated] (SPARK-2186) Spark SQL DSL support for simple aggregations such as SUM and AVG

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2186: Target Version/s: 1.1.0 > Spark SQL DSL support for simple aggregations such as SUM and AVG

[jira] [Commented] (SPARK-2298) Show stage attempt in UI

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045173#comment-14045173 ] Reynold Xin commented on SPARK-2298: Probably do the two at the same time > Show stag

[jira] [Created] (SPARK-2299) Consolidate various stageIdTo* hash maps

2014-06-26 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-2299: -- Summary: Consolidate various stageIdTo* hash maps Key: SPARK-2299 URL: https://issues.apache.org/jira/browse/SPARK-2299 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-2190) Specialized ColumnType for Timestamp

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2190: Target Version/s: 1.1.0 > Specialized ColumnType for Timestamp > --

[jira] [Updated] (SPARK-2298) Show stage attempt in UI

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2298: --- Attachment: Screen Shot 2014-06-25 at 4.54.46 PM.png two attempts of the same stage > Show stage att

[jira] [Created] (SPARK-2298) Show stage attempt in UI

2014-06-26 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-2298: -- Summary: Show stage attempt in UI Key: SPARK-2298 URL: https://issues.apache.org/jira/browse/SPARK-2298 Project: Spark Issue Type: Improvement Componen

[jira] [Resolved] (SPARK-1800) Add broadcast hash join operator

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1800. - Resolution: Fixed https://github.com/apache/spark/pull/1163 > Add broadcast hash join op

[jira] [Updated] (SPARK-1800) Add broadcast hash join operator

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-1800: Assignee: Zongheng Yang (was: Michael Armbrust) > Add broadcast hash join operator > -

[jira] [Updated] (SPARK-2147) Master UI forgets about Executors when application exits cleanly

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2147: --- Fix Version/s: 1.1.0 > Master UI forgets about Executors when application exits cleanly > ---

[jira] [Updated] (SPARK-2147) Master UI forgets about Executors when application exits cleanly

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2147: --- Fix Version/s: 1.0.1 > Master UI forgets about Executors when application exits cleanly > ---

[jira] [Updated] (SPARK-2297) Make task attempt and speculation more explicit in UI

2014-06-26 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2297: --- Attachment: Screen Shot 2014-06-26 at 1.43.52 PM.png screenshot of the fix > Make task attempt and s

[jira] [Created] (SPARK-2297) Make task attempt and speculation more explicit in UI

2014-06-26 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-2297: -- Summary: Make task attempt and speculation more explicit in UI Key: SPARK-2297 URL: https://issues.apache.org/jira/browse/SPARK-2297 Project: Spark Issue Type: I

[jira] [Updated] (SPARK-2066) Better Error message for unresolved attributes

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2066: Fix Version/s: (was: 1.0.1) (was: 1.1.0) > Better Error message

[jira] [Updated] (SPARK-2066) Better Error message for unresolved attributes

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2066: Summary: Better Error message for unresolved attributes (was: org.apache.spark.sql.catalys

[jira] [Updated] (SPARK-2066) Better Error message for unresolved attributes

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2066: Target Version/s: 1.1.0 > Better Error message for unresolved attributes >

[jira] [Resolved] (SPARK-2283) PruningSuite fails if run right after HiveCompatibilitySuite

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2283. - Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Assignee:

[jira] [Resolved] (SPARK-2295) Make JavaBeans nullability stricter.

2014-06-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2295. - Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Assignee:

[jira] [Commented] (SPARK-2251) MLLib Naive Bayes Example SparkException: Can only zip RDDs with same number of elements in each partition

2014-06-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045120#comment-14045120 ] Patrick Wendell commented on SPARK-2251: This is fixed in 1.0.1 via: https://githu

[jira] [Updated] (SPARK-2251) MLLib Naive Bayes Example SparkException: Can only zip RDDs with same number of elements in each partition

2014-06-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2251: --- Fix Version/s: 1.0.1 > MLLib Naive Bayes Example SparkException: Can only zip RDDs with same

[jira] [Commented] (SPARK-2292) NullPointerException in JavaPairRDD.reduceByKey

2014-06-26 Thread Bharath Ravi Kumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045006#comment-14045006 ] Bharath Ravi Kumar commented on SPARK-2292: --- Raised the priority to critical sin

[jira] [Updated] (SPARK-2292) NullPointerException in JavaPairRDD.reduceByKey

2014-06-26 Thread Bharath Ravi Kumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Ravi Kumar updated SPARK-2292: -- Priority: Blocker (was: Major) > NullPointerException in JavaPairRDD.reduceByKey > ---

[jira] [Updated] (SPARK-2292) NullPointerException in JavaPairRDD.reduceByKey

2014-06-26 Thread Bharath Ravi Kumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Ravi Kumar updated SPARK-2292: -- Priority: Critical (was: Blocker) > NullPointerException in JavaPairRDD.reduceByKey >

[jira] [Created] (SPARK-2296) Refactor util.JsonProtocol for evolvability

2014-06-26 Thread Andrew Or (JIRA)
Andrew Or created SPARK-2296: Summary: Refactor util.JsonProtocol for evolvability Key: SPARK-2296 URL: https://issues.apache.org/jira/browse/SPARK-2296 Project: Spark Issue Type: Bug C

[jira] [Commented] (SPARK-1740) Pyspark cancellation kills unrelated pyspark workers

2014-06-26 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044981#comment-14044981 ] Josh Rosen commented on SPARK-1740: --- The "Python daemon -> multiple workers" architectur

[jira] [Commented] (SPARK-2295) Make JavaBeans nullability stricter.

2014-06-26 Thread Takuya Ueshin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044950#comment-14044950 ] Takuya Ueshin commented on SPARK-2295: -- PRed: https://github.com/apache/spark/pull/12

[jira] [Created] (SPARK-2295) Make JavaBeans nullability stricter.

2014-06-26 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-2295: Summary: Make JavaBeans nullability stricter. Key: SPARK-2295 URL: https://issues.apache.org/jira/browse/SPARK-2295 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-2294) TaskSchedulerImpl and TaskSetManager do not properly prioritize which tasks get assigned to an executor

2014-06-26 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-2294: - Summary: TaskSchedulerImpl and TaskSetManager do not properly prioritize which tasks get assigned to an executor Key: SPARK-2294 URL: https://issues.apache.org/jira/browse/SPARK

[jira] [Created] (SPARK-2293) Replace RDD.zip usage by map with predict inside.

2014-06-26 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-2293: Summary: Replace RDD.zip usage by map with predict inside. Key: SPARK-2293 URL: https://issues.apache.org/jira/browse/SPARK-2293 Project: Spark Issue Type: I

[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-06-26 Thread Lisa Hua (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044836#comment-14044836 ] Lisa Hua commented on SPARK-1406: - Hi, any progress on this issue now? > PMML model eval

[jira] [Commented] (SPARK-2280) Java & Scala reference docs should describe function reference behavior.

2014-06-26 Thread Hans Uhlig (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044780#comment-14044780 ] Hans Uhlig commented on SPARK-2280: --- This looks to be partially done for the Python docs

[jira] [Created] (SPARK-2292) NullPointerException in JavaPairRDD.reduceByKey

2014-06-26 Thread Bharath Ravi Kumar (JIRA)
Bharath Ravi Kumar created SPARK-2292: - Summary: NullPointerException in JavaPairRDD.reduceByKey Key: SPARK-2292 URL: https://issues.apache.org/jira/browse/SPARK-2292 Project: Spark Issue

[jira] [Commented] (SPARK-2279) JavaSparkContext should allow creation of EmptyRDD

2014-06-26 Thread Hans Uhlig (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044734#comment-14044734 ] Hans Uhlig commented on SPARK-2279: --- You can, you can also read in an empty file. Howeve

[jira] [Updated] (SPARK-2278) groupBy & groupByKey should support custom comparator

2014-06-26 Thread Hans Uhlig (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Uhlig updated SPARK-2278: -- Description: To maintain parity with MapReduce you should be able to specify a custom key equality func

[jira] [Updated] (SPARK-2278) groupBy & groupByKey should support custom comparator

2014-06-26 Thread Hans Uhlig (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Uhlig updated SPARK-2278: -- Summary: groupBy & groupByKey should support custom comparator (was: groupBy should support custom com

[jira] [Commented] (SPARK-2291) Update EC2 scripts to use instance storage on m3 instance types

2014-06-26 Thread Alessandro Andrioni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044667#comment-14044667 ] Alessandro Andrioni commented on SPARK-2291: I've just sent a pull request on

[jira] [Created] (SPARK-2291) Update EC2 scripts to use instance storage on m3 instance types

2014-06-26 Thread Alessandro Andrioni (JIRA)
Alessandro Andrioni created SPARK-2291: -- Summary: Update EC2 scripts to use instance storage on m3 instance types Key: SPARK-2291 URL: https://issues.apache.org/jira/browse/SPARK-2291 Project: Sp

[jira] [Created] (SPARK-2290) Worker should directly use its own sparkHome instead of appDesc.sparkHome when LaunchExecutor

2014-06-26 Thread YanTang Zhai (JIRA)
YanTang Zhai created SPARK-2290: --- Summary: Worker should directly use its own sparkHome instead of appDesc.sparkHome when LaunchExecutor Key: SPARK-2290 URL: https://issues.apache.org/jira/browse/SPARK-2290

[jira] [Created] (SPARK-2289) Remove use of spark.worker.instances

2014-06-26 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-2289: Summary: Remove use of spark.worker.instances Key: SPARK-2289 URL: https://issues.apache.org/jira/browse/SPARK-2289 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-2289) Remove use of spark.worker.instances

2014-06-26 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044643#comment-14044643 ] Thomas Graves commented on SPARK-2289: -- https://github.com/apache/spark/pull/1214 >

[jira] [Resolved] (SPARK-2289) Remove use of spark.worker.instances

2014-06-26 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-2289. -- Resolution: Fixed > Remove use of spark.worker.instances >

[jira] [Commented] (SPARK-2159) Spark shell exit() does not stop SparkContext

2014-06-26 Thread Adamos Loizou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044589#comment-14044589 ] Adamos Loizou commented on SPARK-2159: -- Fix patch in https://github.com/apache/spark/

[jira] [Commented] (SPARK-1861) ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-26 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044536#comment-14044536 ] sam commented on SPARK-1861: [~mengxr] Any idea when that will be? > ArrayIndexOutOfBoundsExc

[jira] [Commented] (SPARK-2251) MLLib Naive Bayes Example SparkException: Can only zip RDDs with same number of elements in each partition

2014-06-26 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044490#comment-14044490 ] Xiangrui Meng commented on SPARK-2251: -- Found a bug introduced by me in random sample

[jira] [Commented] (SPARK-2044) Pluggable interface for shuffles

2014-06-26 Thread Raymond Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1402#comment-1402 ] Raymond Liu commented on SPARK-2044: Hi [~matei] I am wondering maybe we should hide

  1   2   >