Generalizing QueryRecord to changing inferred CSV headers

James McMahon Tue, 18 Apr 2023 15:03:57 -0700

Hello. I recently asked the community a question about processing CSV
files. I received some helpful advice about using processors such as
ConvertRecord and QueryRecord, and was encouraged to employ Readers and
RecordSetWriters. I've done that, and thank all who replied.


My incoming CSV files come in with different headers because they are
widely different data sets. The header structure is not known in advance.
As such, I configure a QueryRecord processor with a CSVReader that employs
a Schema Access Strategy that is *Use String Fields From Header*. I
configure a CSVRecordSetWriter that sets *Infer Record Schema* as its
Schema Access Strategy.

Now I want to use that QueryRecord processor to characterize the various
fields using SQL. Record counts, min and max values - things of that
nature. But in all the examples I find in YouTube and in the open source,
the authors presume a knowledge of the fields in advance. For example
Property year is set by Value select "year" from FLOWFILE.

We simply don't have that luxury, that awareness in advance. After all,
that's the very reason we inferred the schema in the reader and writer
configuration. The fields are more often than not going to be very
different. Hard wiring them into QueryRecord is not a flow solution that is
flexible enough. We need to grab them from the inferred schema the Reader
and Writer services identified.

What syntax or notation can we use in the QueryRecord sql to say "for each
field found in the header, execute this sql against that field"? I guess
what I'm looking for is iteration through all the inferred schema fields,
and dynamic assignment of the field name in the SQL.

Has anyone faced this same challenge? How did you solve it?
Is there another way to approach this problem?

Thank you in advance,
Jim

Generalizing QueryRecord to changing inferred CSV headers

Reply via email to