[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-63693612 merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-19 Thread davies
Github user davies closed the pull request at: https://github.com/apache/spark/pull/2659 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-18 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-63570860 Going to merge this into `branch-1.2` and `master`. Thanks! (@davies is running large-scale `spark-perf` tests, so this is going to get a lot of QA before we

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-13 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62976015 @JoshRosen I add one more test for broadcast, will do more tests in scale in spark-perf. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62976538 [Test build #23331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23331/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62989889 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62989883 [Test build #23331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23331/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-12 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62777901 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62778764 [Test build #23275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23275/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-12 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62780140 I guess this seems fine to me, since I think I reviewed it previously and it doesn't look like much has changed. It would be nice if there were more explanatory

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-12 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62785045 Actually, one question: could you check some tests into `spark-pef` that both check that large broadcasts don't crash _and_ ensure that the data that I get back is

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62795934 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62795915 [Test build #23275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23275/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-12 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62811873 @JoshRosen I will do that, we can verify it by checksum. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-11 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62668854 @JoshRosen There are several people hit the problem with large broadcast in Python, could we make this into 1.2 release? --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62111305 [Test build #23042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23042/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62063434 [Test build #23020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23020/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62074005 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62073998 [Test build #23020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23020/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62104033 [Test build #23042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23042/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-04 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2659#discussion_r19813742 --- Diff: python/pyspark/serializers.py --- @@ -452,20 +454,182 @@ def loads(self, obj): raise ValueError(invalid sevialization type: %s %

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-04 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61666402 @JoshRosen I had addressed your comments. Sorry for delay, I forgot it. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-04 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-6168 [Test build #22879 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22879/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-04 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61688716 [Test build #22879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22879/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61688725 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-31 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61337831 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61337874 [Test build #22649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22649/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61344030 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61344024 [Test build #22649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22649/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61345977 [Test build #501 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/501/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-31 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-61352193 [Test build #501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/501/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-60473864 [Test build #430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/430/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-60474436 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-60474435 [Test build #22198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22198/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-60472520 [Test build #22198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22198/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-60472504 [Test build #430 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/430/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-59470191 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-59481093 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/392/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-59493977 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/392/consoleFull)** for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-59533144 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/395/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-59547048 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/395/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-08 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58430931 Do you have a script that I can run to test this? We should have a test that creates a huge broadcast variable, serializes it, then checks that the deserialized

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2659#discussion_r18613782 --- Diff: python/pyspark/serializers.py --- @@ -452,20 +454,182 @@ def loads(self, obj): raise ValueError(invalid sevialization type: %s %

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-08 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58445706 The code in the JIRA could be used for test this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2659#discussion_r18548609 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -357,16 +357,23 @@ private[spark] object PythonRDD extends Logging {

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58263504 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21406/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58275153 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58275148 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21406/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-07 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2659#discussion_r18560452 --- Diff: python/pyspark/serializers.py --- @@ -452,20 +454,182 @@ def loads(self, obj): raise ValueError(invalid sevialization type: %s %

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-07 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2659#discussion_r18560488 --- Diff: python/pyspark/serializers.py --- @@ -452,20 +454,182 @@ def loads(self, obj): raise ValueError(invalid sevialization type: %s %

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-07 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58293540 These changes look pretty good to me. Give me some time to try it out locally with a huge broadcast variable and to double-check that the index arithmetic is right.

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58108755 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21346/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58117059 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-58117049 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21346/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-05 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2659 [SPARK-3721] [PySpark] broadcast objects larger than 2G This patch will bring support for broadcasting objects larger than 2G. pickle, zlib, FrameSerializer and Array[Byte] all can not

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-57928044 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21310/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-57928061 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21310/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-57928062 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-57928340 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21311/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-57929753 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21311/consoleFull) for PR 2659 at commit

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-57929754 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-10-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-57958523 SQL changes LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this