Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
This PR was addressed in #18075.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
@kiszk please see #19518 for part 2 of this original PR, and thanks!
---
-
To unsubscribe, e-mail:
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16648
@bdrillard Thank you very much
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
I'm blocking out time to prepare the part 2 PR for this issue starting
today over this week, regarding compaction of excess primitive state.
cc: @kiszk
---
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16648
@bdrillard gentle ping
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16648
kindly ping @bdrillard
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16648
kindly ping @bdrillard
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
Thanks @kiszk, I'll work on preparing a PR for the second half of this
issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16648
ping @bdrillard for the 2nd part of this PR
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16648
thanks, this makes a lot sense! I'll review #18075
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
@cloud-fan Good question, and I think we can resolve it by using different
values for `N` in the
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16648
So this PR introduces 2 approaches to work around the Constant Pool Limit:
1. put member variables to inner class 2: compact primitive declarations into
arrays.
It looks to me that
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
I've created the first part of a pair of PRs to help make this review
easier. Please see #18075 for a PR of the first feature (class splitting of
excess code into nested sub-classes). If that PR
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
@kiszk Sure, I'm glad to help make this change easier to review. I'll first
make a PR that focuses on code splitting into nested classes. There should be a
test case with a number of columns that
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16648
I see. I understand two facts.
1. We can split this into two changes from the implementation view.
2. We cannot fix the [test
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
@kiszk We could do that, definitely. Changes in Feature 1 (splitting excess
code among classes) are limited to the `CodeGeneration` class, and the few
`Generate...` classes included with
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16648
@bdrillard Can we split this PR into two smaller PRs?
1. split excess code among classes
2. compact excess mutable state into arrays
IIUC, `addMutableState()` does return a new
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77102/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #77102 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77102/testReport)**
for PR 16648 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #77102 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77102/testReport)**
for PR 16648 at commit
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
@kiszk, I've updated the pull-request description to include example code
generation for mutable state compaction as well (which comes from inspecting
the [generated
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16648
As written in the comment, this PR enables the following two features.
Current generated code in the description seem to show only feature 1. Would it
be possible to update code to include features 1
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76978 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76978/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76973/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76973 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76973/testReport)**
for PR 16648 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76973 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76973/testReport)**
for PR 16648 at commit
Github user maor121 commented on the issue:
https://github.com/apache/spark/pull/16648
This is a very important fix for a very common use case. In our company we
had to create a workaround in many different cases. Are there any plans to
merge this into 2.0.3 as well?
---
If your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76822/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76822 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76822/testReport)**
for PR 16648 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76822 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76822/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76818/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76818 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76818/testReport)**
for PR 16648 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76818 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76818/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76816 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76816/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76816/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #76816 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76816/testReport)**
for PR 16648 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/16648
Yea, I just wanted to make sure this is on progress in any way.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
@HyukjinKwon @robert3005 I'll have some time soon to update this PR for the
latest master. Thanks for the interest. It is a non-trivial change and would
require a comprehensive code review.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/16648
gentle ping @bdrillard
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user ethanyxu commented on the issue:
https://github.com/apache/spark/pull/16648
Just wanted to mention this is a blocker for using most of the pipeline
transformers for wide data frames, which is sad since 3000 columns (my use
case) is not really very large.
---
If your
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/16648
@bdrillard if you don't have time to finish this up I am happy to update
this to latest. I would really like to see this fixed since it's silly that you
can't have more than 3k columns
---
If
Github user ethanyxu commented on the issue:
https://github.com/apache/spark/pull/16648
I encountered this Exception when handling a data frame with 3000+ columns.
I hope this patch got resolved soon.
---
If your project is set up for it, you can reply to this email and have your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75585/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #75585 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75585/testReport)**
for PR 16648 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #75585 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75585/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75399/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #75399 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75399/testReport)**
for PR 16648 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #75399 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75399/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74348/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #74348 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74348/testReport)**
for PR 16648 at commit
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
I've made some changes to this PR to address @mkiedys comments, and I'm
using his test case, as it sets a higher bar for both class splitting and
management of mutable state. Mutable state and
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #74348 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74348/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74346/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #74346 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74346/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #74346 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74346/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74345/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #74345 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74345/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #74345 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74345/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74341/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #74341 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74341/testReport)**
for PR 16648 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #74341 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74341/testReport)**
for PR 16648 at commit
Github user vitillo commented on the issue:
https://github.com/apache/spark/pull/16648
@bdrillard Is there a particular reason why this patch hasn't been looked
at yet? I think you should CC some of the authors of the code you have changed
to speed things up.
---
If your project is
Github user bdrillard commented on the issue:
https://github.com/apache/spark/pull/16648
Thanks for that other test case. The one you provide I would say falls in
the same class of error, however, this patch is still capable of addressing
some others that still exist. While
Github user mkiedys commented on the issue:
https://github.com/apache/spark/pull/16648
Steps to replicate:
```Scala
val schema = StructType(
(0 to 8000).map(n â StructField(s"column_$n", StringType))
)
val values = schema.map(_ â null)
val rows =
Github user mkiedys commented on the issue:
https://github.com/apache/spark/pull/16648
This patch doesn't seem to be working:
```
org.codehaus.janino.JaninoRuntimeException: Constant pool for class
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71730/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #71730 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71730/testReport)**
for PR 16648 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16648
**[Test build #71730 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71730/testReport)**
for PR 16648 at commit
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/16648
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16648
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
83 matches
Mail list logo