[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094333#comment-14094333 ] Vlad Frolov commented on SPARK-1065: I have finished my experiment of using HDFS as a

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094947#comment-14094947 ] Davies Liu commented on SPARK-1065: --- The broadcast was not used correctly in the above

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094952#comment-14094952 ] Apache Spark commented on SPARK-1065: - User 'davies' has created a pull request for

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094965#comment-14094965 ] Vlad Frolov commented on SPARK-1065: [~davies] Will your PR take into account this

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094976#comment-14094976 ] Vlad Frolov commented on SPARK-1065: [~davies] I have not noticed that there was that

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094981#comment-14094981 ] Davies Liu commented on SPARK-1065: --- [~frol], I think broadcast the RDD object is

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094986#comment-14094986 ] Davies Liu commented on SPARK-1065: --- After this patch, the above test can run

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094987#comment-14094987 ] Vlad Frolov commented on SPARK-1065: [~davies] I understand that if you use broadcast

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094989#comment-14094989 ] Vlad Frolov commented on SPARK-1065: [~devies] I use YARN setup so I will see how it

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095021#comment-14095021 ] Vlad Frolov commented on SPARK-1065: [~davies] I have compiled and run your broadcast

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095063#comment-14095063 ] Vlad Frolov commented on SPARK-1065: Heavy tasks completed in 18 minutes each instead

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095150#comment-14095150 ] Davies Liu commented on SPARK-1065: --- Cool, thanks for the tests. If we can compress the

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-11 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093413#comment-14093413 ] Vlad Frolov commented on SPARK-1065: I am facing the same issue in my project, where I