[jira] [Commented] (SPARK-2982) Glitch of spark streaming

2014-08-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093829#comment-14093829 ] Sean Owen commented on SPARK-2982: -- This does not illustrate anything about Spark. I beli

[jira] [Updated] (SPARK-2967) Several SQL unit test failed when sort-based shuffle is enabled

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2967: Target Version/s: 1.1.0 > Several SQL unit test failed when sort-based shuffle is enabled >

[jira] [Updated] (SPARK-2967) Several SQL unit test failed when sort-based shuffle is enabled

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2967: Priority: Critical (was: Major) > Several SQL unit test failed when sort-based shuffle is

[jira] [Commented] (SPARK-2967) Several SQL unit test failed when sort-based shuffle is enabled

2014-08-12 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093845#comment-14093845 ] Matei Zaharia commented on SPARK-2967: -- Good catch, this is a difference in behavior

[jira] [Commented] (SPARK-2967) Several SQL unit test failed when sort-based shuffle is enabled

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093857#comment-14093857 ] Michael Armbrust commented on SPARK-2967: - Yes, its possible and seems to fix the

[jira] [Assigned] (SPARK-2967) Several SQL unit test failed when sort-based shuffle is enabled

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-2967: --- Assignee: Michael Armbrust > Several SQL unit test failed when sort-based shuffle is

[jira] [Updated] (SPARK-2967) Several SQL unit test failed when sort-based shuffle is enabled

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2967: Component/s: SQL > Several SQL unit test failed when sort-based shuffle is enabled > --

[jira] [Updated] (SPARK-2969) Make ScalaReflection be able to handle MapType.containsNull and MapType.valueContainsNull.

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2969: Target Version/s: 1.1.0 > Make ScalaReflection be able to handle MapType.containsNull and

[jira] [Updated] (SPARK-2970) spark-sql script ends with IOException when EventLogging is enabled

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2970: Target Version/s: 1.1.0 > spark-sql script ends with IOException when EventLogging is enabl

[jira] [Updated] (SPARK-2969) Make ScalaReflection be able to handle MapType.containsNull and MapType.valueContainsNull.

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2969: Assignee: Takuya Ueshin > Make ScalaReflection be able to handle MapType.containsNull and

[jira] [Updated] (SPARK-2925) bin/spark-sql shell throw unrecognized option error when set --driver-java-options

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2925: Target Version/s: 1.1.0 Fix Version/s: (was: 1.1.0) > bin/spark-sql shell throw

[jira] [Commented] (SPARK-2978) Provide an MR-style shuffle transformation

2014-08-12 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093866#comment-14093866 ] Saisai Shao commented on SPARK-2978: Hi Sandy, A simple question: do you mean to add

[jira] [Commented] (SPARK-2967) Several SQL unit test failed when sort-based shuffle is enabled

2014-08-12 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093874#comment-14093874 ] Saisai Shao commented on SPARK-2967: Hi Matei and Michael, thanks a lot for looking in

[jira] [Commented] (SPARK-2062) VertexRDD.apply does not use the mergeFunc

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093897#comment-14093897 ] Apache Spark commented on SPARK-2062: - User 'larryxiao' has created a pull request for

[jira] [Created] (SPARK-2986) setting properties seems not effective

2014-08-12 Thread guowei (JIRA)
guowei created SPARK-2986: - Summary: setting properties seems not effective Key: SPARK-2986 URL: https://issues.apache.org/jira/browse/SPARK-2986 Project: Spark Issue Type: Bug Components:

[jira] [Commented] (SPARK-2986) setting properties seems not effective

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093920#comment-14093920 ] Apache Spark commented on SPARK-2986: - User 'guowei2' has created a pull request for t

[jira] [Assigned] (SPARK-2988) Port repl to scala 2.11.

2014-08-12 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma reassigned SPARK-2988: -- Assignee: Prashant Sharma > Port repl to scala 2.11. > > >

[jira] [Assigned] (SPARK-2987) Adjust build system to support building with scala 2.11 and fix tests.

2014-08-12 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma reassigned SPARK-2987: -- Assignee: Prashant Sharma > Adjust build system to support building with scala 2.11 and

[jira] [Created] (SPARK-2987) Adjust build system to support building with scala 2.11 and fix tests.

2014-08-12 Thread Prashant Sharma (JIRA)
Prashant Sharma created SPARK-2987: -- Summary: Adjust build system to support building with scala 2.11 and fix tests. Key: SPARK-2987 URL: https://issues.apache.org/jira/browse/SPARK-2987 Project: Spa

[jira] [Created] (SPARK-2988) Port repl to scala 2.11.

2014-08-12 Thread Prashant Sharma (JIRA)
Prashant Sharma created SPARK-2988: -- Summary: Port repl to scala 2.11. Key: SPARK-2988 URL: https://issues.apache.org/jira/browse/SPARK-2988 Project: Spark Issue Type: Sub-task R

[jira] [Commented] (SPARK-2987) Adjust build system to support building with scala 2.11 and fix tests.

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093958#comment-14093958 ] Apache Spark commented on SPARK-2987: - User 'ScrapCodes' has created a pull request fo

[jira] [Created] (SPARK-2989) Error sending message to BlockManagerMaster

2014-08-12 Thread pengyanhong (JIRA)
pengyanhong created SPARK-2989: -- Summary: Error sending message to BlockManagerMaster Key: SPARK-2989 URL: https://issues.apache.org/jira/browse/SPARK-2989 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-2989) Error sending message to BlockManagerMaster

2014-08-12 Thread pengyanhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengyanhong updated SPARK-2989: --- Description: run a simple hive sql Spark App via yarn-cluster, got 3 segments log content via yarn

[jira] [Updated] (SPARK-2989) Error sending message to BlockManagerMaster

2014-08-12 Thread pengyanhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengyanhong updated SPARK-2989: --- Affects Version/s: 1.0.2 > Error sending message to BlockManagerMaster >

[jira] [Commented] (SPARK-1766) Move reduceByKey definitions next to each other in PairRDDFunctions

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094184#comment-14094184 ] Apache Spark commented on SPARK-1766: - User 'copester' has created a pull request for

[jira] [Commented] (SPARK-2140) yarn stable client doesn't properly handle MEMORY_OVERHEAD for AM

2014-08-12 Thread Chris Cope (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094191#comment-14094191 ] Chris Cope commented on SPARK-2140: --- >From Client and ClientBase, it looks like the comm

[jira] [Commented] (SPARK-2140) yarn stable client doesn't properly handle MEMORY_OVERHEAD for AM

2014-08-12 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094211#comment-14094211 ] Thomas Graves commented on SPARK-2140: -- The commented out code was sufficient at the

[jira] [Commented] (SPARK-2140) yarn stable client doesn't properly handle MEMORY_OVERHEAD for AM

2014-08-12 Thread Chris Cope (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094244#comment-14094244 ] Chris Cope commented on SPARK-2140: --- But what about Client.scala L87: {code}memoryResour

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094333#comment-14094333 ] Vlad Frolov commented on SPARK-1065: I have finished my experiment of using HDFS as a

[jira] [Resolved] (SPARK-2700) Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2700. - Resolution: Fixed > Hidden files (such as .impala_insert_staging) should be filtered out

[jira] [Updated] (SPARK-2468) Netty-based block server / client module

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2468: --- Summary: Netty-based block server / client module (was: Netty-based shuffle network module) > Netty

[jira] [Created] (SPARK-2990) Recycle ByteBufs by using PooledByteBufAllocator

2014-08-12 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-2990: -- Summary: Recycle ByteBufs by using PooledByteBufAllocator Key: SPARK-2990 URL: https://issues.apache.org/jira/browse/SPARK-2990 Project: Spark Issue Type: Sub-ta

[jira] [Commented] (SPARK-2468) Netty-based block server / client module

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094449#comment-14094449 ] Apache Spark commented on SPARK-2468: - User 'rxin' has created a pull request for this

[jira] [Commented] (SPARK-2830) MLlib v1.1 documentation

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094509#comment-14094509 ] Apache Spark commented on SPARK-2830: - User 'atalwalkar' has created a pull request fo

[jira] [Created] (SPARK-2991) RDD transforms for scan and scanLeft

2014-08-12 Thread Erik Erlandson (JIRA)
Erik Erlandson created SPARK-2991: - Summary: RDD transforms for scan and scanLeft Key: SPARK-2991 URL: https://issues.apache.org/jira/browse/SPARK-2991 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-2991) RDD transforms for scan and scanLeft

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094548#comment-14094548 ] Apache Spark commented on SPARK-2991: - User 'erikerlandson' has created a pull request

[jira] [Updated] (SPARK-2991) RDD transforms for scan and scanLeft

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2991: --- Assignee: Erik Erlandson > RDD transforms for scan and scanLeft > --

[jira] [Created] (SPARK-2992) The transforms formerly known as non-lazy

2014-08-12 Thread Erik Erlandson (JIRA)
Erik Erlandson created SPARK-2992: - Summary: The transforms formerly known as non-lazy Key: SPARK-2992 URL: https://issues.apache.org/jira/browse/SPARK-2992 Project: Spark Issue Type: Umbrell

[jira] [Updated] (SPARK-2991) RDD transforms for scan and scanLeft

2014-08-12 Thread Erik Erlandson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Erlandson updated SPARK-2991: -- Issue Type: Sub-task (was: New Feature) Parent: SPARK-2992 > RDD transforms for scan a

[jira] [Updated] (SPARK-1021) sortByKey() launches a cluster job when it shouldn't

2014-08-12 Thread Erik Erlandson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Erlandson updated SPARK-1021: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-2992 > sortByKey() launches a cluster jo

[jira] [Updated] (SPARK-2315) drop, dropRight and dropWhile which take RDD input and return RDD

2014-08-12 Thread Erik Erlandson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Erlandson updated SPARK-2315: -- Issue Type: Sub-task (was: New Feature) Parent: SPARK-2992 > drop, dropRight and dropW

[jira] [Commented] (SPARK-1486) Support multi-model training in MLlib

2014-08-12 Thread Kyle Ellrott (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094673#comment-14094673 ] Kyle Ellrott commented on SPARK-1486: - It would be helpful to get some feedback if the

[jira] [Commented] (SPARK-2372) Grouped Optimization/Learning

2014-08-12 Thread Kyle Ellrott (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094668#comment-14094668 ] Kyle Ellrott commented on SPARK-2372: - GroupedBinaryClassificationMetrics has been add

[jira] [Updated] (SPARK-2992) The transforms formerly known as non-lazy

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2992: --- Assignee: Erik Erlandson > The transforms formerly known as non-lazy > --

[jira] [Commented] (SPARK-2938) Support SASL authentication in Netty network module

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094741#comment-14094741 ] Reynold Xin commented on SPARK-2938: cc [~tgraves] > Support SASL authentication in N

[jira] [Updated] (SPARK-2959) Use a single FileClient and Netty client thread pool

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2959: --- Assignee: Reynold Xin > Use a single FileClient and Netty client thread pool > --

[jira] [Updated] (SPARK-2941) Add config option to support NIO vs OIO in Netty network module

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2941: --- Assignee: Reynold Xin > Add config option to support NIO vs OIO in Netty network module > ---

[jira] [Updated] (SPARK-2940) Support fetching multiple blocks in a single request in Netty network module

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2940: --- Assignee: Reynold Xin > Support fetching multiple blocks in a single request in Netty network module

[jira] [Updated] (SPARK-2942) Report error messages back from server to client

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2942: --- Assignee: Reynold Xin > Report error messages back from server to client > --

[jira] [Updated] (SPARK-2943) Create config options for Netty sendBufferSize and receiveBufferSize

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2943: --- Assignee: Reynold Xin > Create config options for Netty sendBufferSize and receiveBufferSize > --

[jira] [Resolved] (SPARK-2952) Enable logging actor messages at DEBUG level

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-2952. Resolution: Fixed Fix Version/s: 1.1.0 > Enable logging actor messages at DEBUG level >

[jira] [Commented] (SPARK-2706) Enable Spark to support Hive 0.13

2014-08-12 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094810#comment-14094810 ] Zhan Zhang commented on SPARK-2706: --- It may caused by hive side change, in the TestHive

[jira] [Resolved] (SPARK-2833) performance tests for linear regression

2014-08-12 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz resolved SPARK-2833. Resolution: Fixed > performance tests for linear regression > -

[jira] [Updated] (SPARK-2835) performance tests for statistical functions

2014-08-12 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-2835: - Assignee: Burak Yavuz (was: Doris Xin) > performance tests for statistical functions > -

[jira] [Resolved] (SPARK-2837) performance tests for ALS

2014-08-12 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz resolved SPARK-2837. Resolution: Done > performance tests for ALS > - > > Key: S

[jira] [Closed] (SPARK-2836) performance tests for k-means

2014-08-12 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz closed SPARK-2836. -- Resolution: Fixed > performance tests for k-means > - > > K

[jira] [Closed] (SPARK-2835) performance tests for statistical functions

2014-08-12 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng closed SPARK-2835. Resolution: Fixed > performance tests for statistical functions > -

[jira] [Resolved] (SPARK-2834) performance tests for linear algebra functions

2014-08-12 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz resolved SPARK-2834. Resolution: Fixed > performance tests for linear algebra functions > --

[jira] [Created] (SPARK-2993) colStats in Statistics as wrapper around MultivariateStatisticalSummary in Scala and Python

2014-08-12 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2993: Summary: colStats in Statistics as wrapper around MultivariateStatisticalSummary in Scala and Python Key: SPARK-2993 URL: https://issues.apache.org/jira/browse/SPARK-2993 Pro

[jira] [Resolved] (SPARK-2832) performance tests for decision tree

2014-08-12 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2832. -- Resolution: Fixed > performance tests for decision tree > --- >

[jira] [Resolved] (SPARK-2829) Implement MLlib performance tests in spark-perf

2014-08-12 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz resolved SPARK-2829. Resolution: Fixed > Implement MLlib performance tests in spark-perf > -

[jira] [Resolved] (SPARK-2831) performance tests for linear classification methods

2014-08-12 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz resolved SPARK-2831. Resolution: Fixed > performance tests for linear classification methods > -

[jira] [Commented] (SPARK-2993) colStats in Statistics as wrapper around MultivariateStatisticalSummary in Scala and Python

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094861#comment-14094861 ] Apache Spark commented on SPARK-2993: - User 'dorx' has created a pull request for this

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-08-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094872#comment-14094872 ] Sandy Ryza commented on SPARK-2089: --- My opinion is that we should have a narrower API fo

[jira] [Updated] (SPARK-2977) Fix handling of short shuffle manager names in ShuffleBlockManager

2014-08-12 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-2977: -- Assignee: (was: Josh Rosen) > Fix handling of short shuffle manager names in ShuffleBlockManager >

[jira] [Assigned] (SPARK-2977) Fix handling of short shuffle manager names in ShuffleBlockManager

2014-08-12 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-2977: - Assignee: Josh Rosen > Fix handling of short shuffle manager names in ShuffleBlockManager > -

[jira] [Updated] (SPARK-2994) Support for Hive UDFs that return arrays of structs

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2994: Priority: Critical (was: Major) > Support for Hive UDFs that return arrays of structs > --

[jira] [Created] (SPARK-2994) Support for Hive UDFs that return arrays of structs

2014-08-12 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-2994: --- Summary: Support for Hive UDFs that return arrays of structs Key: SPARK-2994 URL: https://issues.apache.org/jira/browse/SPARK-2994 Project: Spark Issue

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094947#comment-14094947 ] Davies Liu commented on SPARK-1065: --- The broadcast was not used correctly in the above c

[jira] [Comment Edited] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094947#comment-14094947 ] Davies Liu edited comment on SPARK-1065 at 8/13/14 12:33 AM: -

[jira] [Comment Edited] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094947#comment-14094947 ] Davies Liu edited comment on SPARK-1065 at 8/13/14 12:32 AM: -

[jira] [Comment Edited] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094947#comment-14094947 ] Davies Liu edited comment on SPARK-1065 at 8/13/14 12:34 AM: -

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094952#comment-14094952 ] Apache Spark commented on SPARK-1065: - User 'davies' has created a pull request for th

[jira] [Comment Edited] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094947#comment-14094947 ] Davies Liu edited comment on SPARK-1065 at 8/13/14 12:33 AM: -

[jira] [Comment Edited] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094947#comment-14094947 ] Davies Liu edited comment on SPARK-1065 at 8/13/14 12:34 AM: -

[jira] [Comment Edited] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094947#comment-14094947 ] Davies Liu edited comment on SPARK-1065 at 8/13/14 12:34 AM: -

[jira] [Created] (SPARK-2995) Allow to set storage level for intermediate RDDs in ALS

2014-08-12 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-2995: Summary: Allow to set storage level for intermediate RDDs in ALS Key: SPARK-2995 URL: https://issues.apache.org/jira/browse/SPARK-2995 Project: Spark Issue T

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094965#comment-14094965 ] Vlad Frolov commented on SPARK-1065: [~davies] Will your PR take into account this fix

[jira] [Commented] (SPARK-2995) Allow to set storage level for intermediate RDDs in ALS

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094973#comment-14094973 ] Apache Spark commented on SPARK-2995: - User 'mengxr' has created a pull request for th

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094976#comment-14094976 ] Vlad Frolov commented on SPARK-1065: [~davies] I have not noticed that there was that

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094981#comment-14094981 ] Davies Liu commented on SPARK-1065: --- [~frol], I think broadcast the RDD object is alread

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094986#comment-14094986 ] Davies Liu commented on SPARK-1065: --- After this patch, the above test can run successful

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094987#comment-14094987 ] Vlad Frolov commented on SPARK-1065: [~davies] I understand that if you use broadcast

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094989#comment-14094989 ] Vlad Frolov commented on SPARK-1065: [~devies] I use YARN setup so I will see how it g

[jira] [Comment Edited] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094989#comment-14094989 ] Vlad Frolov edited comment on SPARK-1065 at 8/13/14 1:05 AM: -

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095021#comment-14095021 ] Vlad Frolov commented on SPARK-1065: [~davies] I have compiled and run your broadcast

[jira] [Created] (SPARK-2996) Standalone and Yarn have different settings for adding the user classpath first

2014-08-12 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-2996: - Summary: Standalone and Yarn have different settings for adding the user classpath first Key: SPARK-2996 URL: https://issues.apache.org/jira/browse/SPARK-2996 Proje

[jira] [Updated] (SPARK-2994) Support for Hive UDFs that take arrays of structs as arguments

2014-08-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2994: Summary: Support for Hive UDFs that take arrays of structs as arguments (was: Support for

[jira] [Commented] (SPARK-2994) Support for Hive UDFs that take arrays of structs as arguments

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095059#comment-14095059 ] Apache Spark commented on SPARK-2994: - User 'marmbrus' has created a pull request for

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095063#comment-14095063 ] Vlad Frolov commented on SPARK-1065: Heavy tasks completed in 18 minutes each instead

[jira] [Created] (SPARK-2997) Improve documentation for dimensionality reduction

2014-08-12 Thread Ameet Talwalkar (JIRA)
Ameet Talwalkar created SPARK-2997: -- Summary: Improve documentation for dimensionality reduction Key: SPARK-2997 URL: https://issues.apache.org/jira/browse/SPARK-2997 Project: Spark Issue Ty

[jira] [Created] (SPARK-2998) scala.collection.mutable.HashSet cannot be cast to scala.collection.mutable.BitSet

2014-08-12 Thread pengyanhong (JIRA)
pengyanhong created SPARK-2998: -- Summary: scala.collection.mutable.HashSet cannot be cast to scala.collection.mutable.BitSet Key: SPARK-2998 URL: https://issues.apache.org/jira/browse/SPARK-2998 Project:

[jira] [Updated] (SPARK-2998) scala.collection.mutable.HashSet cannot be cast to scala.collection.mutable.BitSet

2014-08-12 Thread pengyanhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengyanhong updated SPARK-2998: --- Description: run a Hive SQL via yarn-cluster, got error as below: {quote} 14/08/13 11:10:01 INFO org

[jira] [Commented] (SPARK-2736) Create PySpark RDD from Apache Avro File

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095120#comment-14095120 ] Apache Spark commented on SPARK-2736: - User 'kanzhang' has created a pull request for

[jira] [Updated] (SPARK-2998) scala.collection.mutable.HashSet cannot be cast to scala.collection.mutable.BitSet

2014-08-12 Thread pengyanhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengyanhong updated SPARK-2998: --- Description: run a HiveQL via yarn-cluster, got error as below: {quote} 14/08/13 11:10:01 INFO org.a

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095150#comment-14095150 ] Davies Liu commented on SPARK-1065: --- Cool, thanks for the tests. If we can compress the

[jira] [Created] (SPARK-2999) Compress all the serialized data

2014-08-12 Thread Davies Liu (JIRA)
Davies Liu created SPARK-2999: - Summary: Compress all the serialized data Key: SPARK-2999 URL: https://issues.apache.org/jira/browse/SPARK-2999 Project: Spark Issue Type: Improvement Co

[jira] [Resolved] (SPARK-2953) Allow using short names for io compression codecs

2014-08-12 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-2953. Resolution: Fixed Fix Version/s: 1.1.0 > Allow using short names for io compression codecs >

[jira] [Created] (SPARK-3000) drop old blocks to disk in parallel when memory is not large enough for caching new blocks

2014-08-12 Thread Zhang, Liye (JIRA)
Zhang, Liye created SPARK-3000: -- Summary: drop old blocks to disk in parallel when memory is not large enough for caching new blocks Key: SPARK-3000 URL: https://issues.apache.org/jira/browse/SPARK-3000

[jira] [Updated] (SPARK-3000) drop old blocks to disk in parallel when memory is not large enough for caching new blocks

2014-08-12 Thread Zhang, Liye (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang, Liye updated SPARK-3000: --- Description: In spark, rdd can be cached in memory for later use, and the cached memory size

  1   2   >