Martin Andersson created SEDONA-205:
---------------------------------------

             Summary: Use BinaryType in GeometryUDT in Sedona Spark
                 Key: SEDONA-205
                 URL: https://issues.apache.org/jira/browse/SEDONA-205
             Project: Apache Sedona
          Issue Type: Improvement
            Reporter: Martin Andersson


GeometryUDT currently uses ArrayType(ByteType()) as the serialized data type 
for geometries. The array type in Spark is an array of objects and not 
primitive types. Every byte is boxed into a Byte object and the object 
reference is stored in the array. This adds a significant overhead. The more 
specialized BinaryType is an array of primitive bytes.

 

I did a quick benchmark chaining a bunch of st-functions, no joins. With 
BinaryType the performance increased by roughly 30%.

 

The old Apache commons-codec bundled with sernetcdf needs to be fixed first. 
Otherwise Spark fails when calling encodeHexString() as seen in 
https://github.com/apache/incubator-sedona/pull/704



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to