[Catalyst] Code Generation and the Constant Pool Limit

Aleksander Eskilson Fri, 12 May 2017 12:40:06 -0700

Hi all,

I want to take a moment to highlight an issue and invite hopefully some
developers to review a pull request
<https://github.com/apache/spark/pull/16648> [1] for SPARK-18016
<https://issues.apache.org/jira/browse/SPARK-18016> [2]. Code generated by
Catalyst currently places all split methods and variables into single
classes. When the data schema is sufficiently complex (wide/deeply nested),
the volume of generated constants declared either in methods or as global
variables exceeds a Java class's Constant Pool Limit, causing an exception.
Without a fix to this issue, there is an effective limit on the complexity
of data that can be marshaled to a DataFrame/Dataset. A method for
addressing this issue is discussed in the pull request. The change is
non-trivial, so I'm hoping to get a few sets of eyes on it, especially ones
that might be more familiar with the preferred direction of the Catalyst
project.


--
Alek Eskilson

[1] - https://github.com/apache/spark/pull/16648
[2] - https://issues.apache.org/jira/browse/SPARK-18016

[Catalyst] Code Generation and the Constant Pool Limit

Reply via email to