Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-63693612
merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user davies closed the pull request at:
https://github.com/apache/spark/pull/2659
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-63570860
Going to merge this into `branch-1.2` and `master`. Thanks!
(@davies is running large-scale `spark-perf` tests, so this is going to get
a lot of QA before we
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62976015
@JoshRosen I add one more test for broadcast, will do more tests in scale
in spark-perf.
---
If your project is set up for it, you can reply to this email and have your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62976538
[Test build #23331 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23331/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62989889
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62989883
[Test build #23331 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23331/consoleFull)
for PR 2659 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62777901
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62778764
[Test build #23275 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23275/consoleFull)
for PR 2659 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62780140
I guess this seems fine to me, since I think I reviewed it previously and
it doesn't look like much has changed. It would be nice if there were more
explanatory
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62785045
Actually, one question: could you check some tests into `spark-pef` that
both check that large broadcasts don't crash _and_ ensure that the data that I
get back is
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62795934
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62795915
[Test build #23275 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23275/consoleFull)
for PR 2659 at commit
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62811873
@JoshRosen I will do that, we can verify it by checksum.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62668854
@JoshRosen There are several people hit the problem with large broadcast in
Python, could we make this into 1.2 release?
---
If your project is set up for it, you can
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62111305
[Test build #23042 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23042/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62063434
[Test build #23020 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23020/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62074005
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62073998
[Test build #23020 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23020/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-62104033
[Test build #23042 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23042/consoleFull)
for PR 2659 at commit
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2659#discussion_r19813742
--- Diff: python/pyspark/serializers.py ---
@@ -452,20 +454,182 @@ def loads(self, obj):
raise ValueError(invalid sevialization type: %s %
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61666402
@JoshRosen I had addressed your comments. Sorry for delay, I forgot it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-6168
[Test build #22879 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22879/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61688716
[Test build #22879 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22879/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61688725
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61337831
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61337874
[Test build #22649 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22649/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61344030
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61344024
[Test build #22649 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22649/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61345977
[Test build #501 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/501/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-61352193
[Test build #501 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/501/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-60473864
[Test build #430 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/430/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-60474436
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-60474435
[Test build #22198 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22198/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-60472520
[Test build #22198 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22198/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-60472504
[Test build #430 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/430/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-59470191
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-59481093
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/392/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-59493977
**[Tests timed
out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/392/consoleFull)**
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-59533144
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/395/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-59547048
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/395/consoleFull)
for PR 2659 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58430931
Do you have a script that I can run to test this? We should have a test
that creates a huge broadcast variable, serializes it, then checks that the
deserialized
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2659#discussion_r18613782
--- Diff: python/pyspark/serializers.py ---
@@ -452,20 +454,182 @@ def loads(self, obj):
raise ValueError(invalid sevialization type: %s %
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58445706
The code in the JIRA could be used for test this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/2659#discussion_r18548609
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
---
@@ -357,16 +357,23 @@ private[spark] object PythonRDD extends Logging {
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58263504
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21406/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58275153
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58275148
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21406/consoleFull)
for PR 2659 at commit
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2659#discussion_r18560452
--- Diff: python/pyspark/serializers.py ---
@@ -452,20 +454,182 @@ def loads(self, obj):
raise ValueError(invalid sevialization type: %s %
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2659#discussion_r18560488
--- Diff: python/pyspark/serializers.py ---
@@ -452,20 +454,182 @@ def loads(self, obj):
raise ValueError(invalid sevialization type: %s %
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58293540
These changes look pretty good to me. Give me some time to try it out
locally with a huge broadcast variable and to double-check that the index
arithmetic is right.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58108755
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21346/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58117059
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-58117049
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21346/consoleFull)
for PR 2659 at commit
GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/2659
[SPARK-3721] [PySpark] broadcast objects larger than 2G
This patch will bring support for broadcasting objects larger than 2G.
pickle, zlib, FrameSerializer and Array[Byte] all can not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-57928044
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21310/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-57928061
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21310/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-57928062
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-57928340
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21311/consoleFull)
for PR 2659 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-57929753
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21311/consoleFull)
for PR 2659 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-57929754
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/2659#issuecomment-57958523
SQL changes LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
62 matches
Mail list logo