Jim, QueryRecord uses Apache Calcite under the hood and is thus at the mercy of the SQL standard (and any additional rules/dialect from Apache Calcite) so in general you can't select "all except X" or "all except change X to Y". Does it need to be SQL executed against the individual fields? If not, take a look at ScriptedTransformRecord doc (and its Additional Details page). IIRC you're a Groovy guy now ;) so you should be able to alter the fields as you see fit using Groovy rather than SQL (alternatively Jython as you've done a bunch of that as well).
Regards, Matt On Tue, Apr 18, 2023 at 6:04 PM James McMahon <jsmcmah...@gmail.com> wrote: > > Hello. I recently asked the community a question about processing CSV files. > I received some helpful advice about using processors such as ConvertRecord > and QueryRecord, and was encouraged to employ Readers and RecordSetWriters. > I've done that, and thank all who replied. > > My incoming CSV files come in with different headers because they are widely > different data sets. The header structure is not known in advance. As such, I > configure a QueryRecord processor with a CSVReader that employs a Schema > Access Strategy that is Use String Fields From Header. I configure a > CSVRecordSetWriter that sets Infer Record Schema as its Schema Access > Strategy. > > Now I want to use that QueryRecord processor to characterize the various > fields using SQL. Record counts, min and max values - things of that nature. > But in all the examples I find in YouTube and in the open source, the authors > presume a knowledge of the fields in advance. For example Property year is > set by Value select "year" from FLOWFILE. > > We simply don't have that luxury, that awareness in advance. After all, > that's the very reason we inferred the schema in the reader and writer > configuration. The fields are more often than not going to be very different. > Hard wiring them into QueryRecord is not a flow solution that is flexible > enough. We need to grab them from the inferred schema the Reader and Writer > services identified. > > What syntax or notation can we use in the QueryRecord sql to say "for each > field found in the header, execute this sql against that field"? I guess what > I'm looking for is iteration through all the inferred schema fields, and > dynamic assignment of the field name in the SQL. > > Has anyone faced this same challenge? How did you solve it? > Is there another way to approach this problem? > > Thank you in advance, > Jim