[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095150#comment-14095150 ] Davies Liu commented on SPARK-1065: --- Cool, thanks for the tests. If we can compress the

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095063#comment-14095063 ] Vlad Frolov commented on SPARK-1065: Heavy tasks completed in 18 minutes each instead

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095021#comment-14095021 ] Vlad Frolov commented on SPARK-1065: [~davies] I have compiled and run your broadcast

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094989#comment-14094989 ] Vlad Frolov commented on SPARK-1065: [~devies] I use YARN setup so I will see how it g

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094987#comment-14094987 ] Vlad Frolov commented on SPARK-1065: [~davies] I understand that if you use broadcast

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094986#comment-14094986 ] Davies Liu commented on SPARK-1065: --- After this patch, the above test can run successful

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094981#comment-14094981 ] Davies Liu commented on SPARK-1065: --- [~frol], I think broadcast the RDD object is alread

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094976#comment-14094976 ] Vlad Frolov commented on SPARK-1065: [~davies] I have not noticed that there was that

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094965#comment-14094965 ] Vlad Frolov commented on SPARK-1065: [~davies] Will your PR take into account this fix

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094952#comment-14094952 ] Apache Spark commented on SPARK-1065: - User 'davies' has created a pull request for th

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094947#comment-14094947 ] Davies Liu commented on SPARK-1065: --- The broadcast was not used correctly in the above c

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-12 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094333#comment-14094333 ] Vlad Frolov commented on SPARK-1065: I have finished my experiment of using HDFS as a

[jira] [Commented] (SPARK-1065) PySpark runs out of memory with large broadcast variables

2014-08-11 Thread Vlad Frolov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093413#comment-14093413 ] Vlad Frolov commented on SPARK-1065: I am facing the same issue in my project, where I