Hi there, Currently working on a custom Encoder for a kind of schema-based Java object. For the object's schema, field positions, and types are isomorphic to SQL column ordinals and types. The implementation should be quite similar to the JavaBean Encoder, but as we have a schema, class-based reflection should be unnecessary. As the JavaBean deserializer does, I'm working on placing column values of the serialized row into a newly created object with analogous fields. I collect a list of setter arguments, then use an expression similar to InitializeJavaBean to call the setter expressions one-by-one on the new struct-like object. I've tried two methods for extracting column values as arguments to NewInstance expressions, which are then arguments to the setters:
First, since I will always know the ordinal and type, I've attempted to use the GetColumnByOrdinal(ordinal, type) function as the input expression argument. Then calling something like val objectFromRow = expressionEncoder.resolveAndBind(attrs).fromRow(row) yields org.apache.spark.sql.AnalysisException: unresolved operator 'DeserializeToObject ... I've also tried to reference the value using a bound symbol, with the attrs DslSymbol sequence naming the symbol and its type, then using the UnresolvedAttribute(fieldName) function to attempt to let the compiler replace the symbol with the correct accessor methods, as is done in JavaBean. For a row with a single integer value, and an object with a field of type java.lang.Integer and name val, I receive instead org.apache.spark.sql.AnalysisException: resolved attribute(s) 'val missing from val#1 in operator 'DeserializeToObject ... with the quoted symbol appearing as my argument to my object's setter method. What is the current proper way to extract a value from a row column in the Expressions API for use as an argument expression in a deserializer? Thanks, Alek