Hi all, I want to take a moment to highlight an issue and invite hopefully some developers to review a pull request <https://github.com/apache/spark/pull/16648> [1] for SPARK-18016 <https://issues.apache.org/jira/browse/SPARK-18016> [2]. Code generated by Catalyst currently places all split methods and variables into single classes. When the data schema is sufficiently complex (wide/deeply nested), the volume of generated constants declared either in methods or as global variables exceeds a Java class's Constant Pool Limit, causing an exception. Without a fix to this issue, there is an effective limit on the complexity of data that can be marshaled to a DataFrame/Dataset. A method for addressing this issue is discussed in the pull request. The change is non-trivial, so I'm hoping to get a few sets of eyes on it, especially ones that might be more familiar with the preferred direction of the Catalyst project.
-- Alek Eskilson [1] - https://github.com/apache/spark/pull/16648 [2] - https://issues.apache.org/jira/browse/SPARK-18016