Andre,

I don't believe that QueryRecord supports RecordPath expressions as
"column names" (but someone should correct me if I'm mistaken :) I
think it takes a tabular view of the record, meaning each top-level
field is a "column" for the purposes of querying.

For JSON, the JsonPathReader lets you "hoist" fields from a nested
record into what looks like top-level fields from the record API
perspective. For Avro, there is no current AvroPathReader, which IIRC
is partly due to the fact that there is no AvroPath implementation in
the Avro project [1]. However the Jira case mentions an
Apache-licensed library for doing such things, and certainly
RecordPathCompiler handles it from the Record API side. I think the
kicker is how to expose fields for projections (i.e. SELECTs); for
arbitrarily nested records, that's a lot of "columns" and/or
non-standard SQL. It can be done (Apache Drill does); but from what I
remember, Drill has some crafted Calcite voodoo, where QueryRecord
leverages Calcite as a library.

A workaround could use UpdateRecord to manually hoist the field to the
top-level by adding a new field to the outgoing schema at the
top-level, and in UpdateRecord setting its value to the evaluation of
your RecordPath "/header/deviceEventClassId". Then you could use
QueryRecord on that "column".

Regards,
Matt

[1] https://issues.apache.org/jira/browse/AVRO-992

On Mon, Jul 3, 2017 at 10:07 PM, Andre <andre-li...@fucs.org> wrote:
> All,
>
> What is the expected to refer to nested avro element from within QueryRecord
> SQL queries?
>
> When using LookupRecord I may refer to a record child element by the
> RecordPath "/header/deviceEventClassId", however, it is not clear what
> syntax should I use with QueryRecord?
>
> SELECT * FROM FLOWFILE WHERE "/header/deviceEventClassId" <> "1"
>
> Fails with column not found.

Reply via email to