I should mention for aggregate values like COUNT(), check out the CalculateRecordStats processor, not sure if it takes a `/` value (or whatever means 'select all fields') for a RecordPath or not, if not we should probably support if prudent. It might also be a nice improvement to add MAX/MIN support as well, feel free to file a Jira for that improvement if it will help.
- Matt On Tue, Apr 18, 2023 at 6:16 PM Matt Burgess <mattyb...@apache.org> wrote: > > Jim, > > QueryRecord uses Apache Calcite under the hood and is thus at the > mercy of the SQL standard (and any additional rules/dialect from > Apache Calcite) so in general you can't select "all except X" or "all > except change X to Y". Does it need to be SQL executed against the > individual fields? If not, take a look at ScriptedTransformRecord doc > (and its Additional Details page). IIRC you're a Groovy guy now ;) so > you should be able to alter the fields as you see fit using Groovy > rather than SQL (alternatively Jython as you've done a bunch of that > as well). > > Regards, > Matt > > On Tue, Apr 18, 2023 at 6:04 PM James McMahon <jsmcmah...@gmail.com> wrote: > > > > Hello. I recently asked the community a question about processing CSV > > files. I received some helpful advice about using processors such as > > ConvertRecord and QueryRecord, and was encouraged to employ Readers and > > RecordSetWriters. I've done that, and thank all who replied. > > > > My incoming CSV files come in with different headers because they are > > widely different data sets. The header structure is not known in advance. > > As such, I configure a QueryRecord processor with a CSVReader that employs > > a Schema Access Strategy that is Use String Fields From Header. I configure > > a CSVRecordSetWriter that sets Infer Record Schema as its Schema Access > > Strategy. > > > > Now I want to use that QueryRecord processor to characterize the various > > fields using SQL. Record counts, min and max values - things of that > > nature. But in all the examples I find in YouTube and in the open source, > > the authors presume a knowledge of the fields in advance. For example > > Property year is set by Value select "year" from FLOWFILE. > > > > We simply don't have that luxury, that awareness in advance. After all, > > that's the very reason we inferred the schema in the reader and writer > > configuration. The fields are more often than not going to be very > > different. Hard wiring them into QueryRecord is not a flow solution that is > > flexible enough. We need to grab them from the inferred schema the Reader > > and Writer services identified. > > > > What syntax or notation can we use in the QueryRecord sql to say "for each > > field found in the header, execute this sql against that field"? I guess > > what I'm looking for is iteration through all the inferred schema fields, > > and dynamic assignment of the field name in the SQL. > > > > Has anyone faced this same challenge? How did you solve it? > > Is there another way to approach this problem? > > > > Thank you in advance, > > Jim