Martin Andersson created SEDONA-205: ---------------------------------------
Summary: Use BinaryType in GeometryUDT in Sedona Spark Key: SEDONA-205 URL: https://issues.apache.org/jira/browse/SEDONA-205 Project: Apache Sedona Issue Type: Improvement Reporter: Martin Andersson GeometryUDT currently uses ArrayType(ByteType()) as the serialized data type for geometries. The array type in Spark is an array of objects and not primitive types. Every byte is boxed into a Byte object and the object reference is stored in the array. This adds a significant overhead. The more specialized BinaryType is an array of primitive bytes. I did a quick benchmark chaining a bunch of st-functions, no joins. With BinaryType the performance increased by roughly 30%. The old Apache commons-codec bundled with sernetcdf needs to be fixed first. Otherwise Spark fails when calling encodeHexString() as seen in https://github.com/apache/incubator-sedona/pull/704 -- This message was sent by Atlassian Jira (v8.20.10#820010)