Yeah that's definitely doable, most of the logic for writing a ResultSet to a Flow File is localized (currently to JdbcCommon but also in ResultSetRecordSet), so I wouldn't think it would be too much refactor. What are folks thoughts on whether to add a Record Writer property to the existing ExecuteSQL or subclass it to a new processor called ExecuteSQLRecord? The former is more consistent with how the SiteToSite reporting tasks work, but this is a processor. The latter is more consistent with the way we've done other record processors, and the benefit there is that we don't have to add a bunch of documentation to fields that will be ignored (such as the Use Avro Logical Types property which we wouldn't need in a ExecuteSQLRecord). Having said that, we will want to offer the same options in the Avro Reader/Writer, but Peter is working on that under NIFI-5405 [1].
Thanks, Matt [1] https://issues.apache.org/jira/browse/NIFI-5405 On Tue, Aug 7, 2018 at 2:06 PM Andy LoPresto <alopre...@apache.org> wrote: > > Matt, > > Would extending the core ExecuteSQL processor with an ExecuteSQLRecord > processor also work? I wonder about discoverability if only one processor is > present and in other places we explicitly name the processors which handle > records as such. If the ExecuteSQL processor handled all the SQL logic, and > the ExecuteSQLRecord processor just delegated most of the processing in its > #onTrigger() method to super, do you foresee any substantial difficulties? It > might require some refactoring of the parent #onTrigger() to service methods. > > > Andy LoPresto > alopre...@apache.org > alopresto.apa...@gmail.com > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Aug 7, 2018, at 10:25 AM, Andrew Grande <apere...@gmail.com> wrote: > > As a side note, one has to ha e a serious justification _not_ to use > record-based processors. The benefits, including performance, are too > numerous to call out here. > > Andrew > > On Tue, Aug 7, 2018, 1:15 PM Mark Payne <marka...@hotmail.com> wrote: >> >> Boris, >> >> Using a Record-based processor does not mean that you need to define a >> schema upfront. This is >> necessary if the source itself cannot provide a schema. However, since it is >> pulling structured data >> and the schema can be inferred from the database, you wouldn't need to. As >> Matt was saying, your >> Record Writer can simply be configured to Inherit Record Schema. It can then >> write the schema to >> the "avro.schema" attribute or you can choose "Do Not Write Schema". This >> would still allow the data >> to be written in JSON, CSV, etc. >> >> You could also have the Record Writer choose to write the schema using the >> "avro.schema" attribute, >> as mentioned above, and then have any down-stream processors read the schema >> from this attribute. >> This would allow you to use any record-oriented processors you'd like >> without having to define the >> schema yourself, if you don't want to. >> >> Thanks >> -Mark >> >> >> >> On Aug 7, 2018, at 12:37 PM, Boris Tyukin <bo...@boristyukin.com> wrote: >> >> thanks for all the responses! it means I am not the only one interested in >> this topic. >> >> Record-aware version would be really nice, but a lot of times I do not want >> to use record-based processors since I need to define a schema for >> input/output upfront and just want to run SQL query and get whatever results >> back. It just adds an extra step that will be subject to break/support. >> >> Similar to Kafka processors, it is nice to have an option of record-based >> processor vs. message oriented processor. But if one processor can do it >> all, it is even better :) >> >> >> On Tue, Aug 7, 2018 at 9:28 AM Matt Burgess <mattyb...@apache.org> wrote: >>> >>> I'm definitely interested in supporting a record-aware version as well >>> (I wrote the Jira up last year [1] but haven't gotten around to >>> implementing it), however I agree with Peter's comment on the Jira. >>> Since ExecuteSQL is an oft-touched processor, if we had two processors >>> that only differed in how the output is formatted, it could be harder >>> to maintain (bugs to be fixed in two places, e.g.). I think we should >>> add an optional RecordWriter property to ExecuteSQL, and the >>> documentation would reflect that if it is not set, the output will be >>> Avro with embedded schema as it has always been. If the RecordWriter >>> is set, either the schema can be hardcoded, or they can use "Inherit >>> Record Schema" even though there's no reader, and that would mimic the >>> current behavior where the schema is inferred from the database >>> columns and used for the writer. There is precedence for this pattern >>> in the SiteToSite reporting tasks. >>> >>> To Bryan's point about history, Avro at the time was the most >>> descriptive of the solutions because it maintains the schema and >>> datatypes with the data, unlike JSON, CSV, etc. Also before the record >>> readers/writers, as Bryan said, you pretty much had to split, >>> transform, merge. We just need to make that processor (and others with >>> specific input/output formats) "record-aware" for better performance. >>> >>> Regards, >>> Matt >>> >>> [1] https://issues.apache.org/jira/browse/NIFI-4517 >>> On Tue, Aug 7, 2018 at 9:20 AM Bryan Bende <bbe...@gmail.com> wrote: >>> > >>> > I would also add that the pattern of splitting to 1 record per flow >>> > file was common before the record processors existed, and generally >>> > this can/should be avoided now in favor of processing/manipulating >>> > records in place, and keeping them together in large batches. >>> > >>> > >>> > >>> > On Tue, Aug 7, 2018 at 9:10 AM, Andrew Grande <apere...@gmail.com> wrote: >>> > > Careful, that makes too much sense, Joe ;) >>> > > >>> > > >>> > > On Tue, Aug 7, 2018, 8:45 AM Joe Witt <joe.w...@gmail.com> wrote: >>> > >> >>> > >> i think we just need to make an ExecuteSqlRecord processor. >>> > >> >>> > >> thanks >>> > >> >>> > >> On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen <mikerthom...@gmail.com> >>> > >> wrote: >>> > >>> >>> > >>> My guess is that it is due to the fact that Avro is the only record >>> > >>> type >>> > >>> that can match sql pretty closely feature to feature on data types. >>> > >>> On Tue, Aug 7, 2018 at 8:33 AM Boris Tyukin <bo...@boristyukin.com> >>> > >>> wrote: >>> > >>>> >>> > >>>> I've been wondering since I started learning NiFi why ExecuteSQL >>> > >>>> processor only returns AVRO formatted data. All community examples >>> > >>>> I've seen >>> > >>>> then convert AVRO to json and pretty much all of them then split >>> > >>>> json to >>> > >>>> multiple flows. >>> > >>>> >>> > >>>> I found myself doing the same thing over and over and over again. >>> > >>>> >>> > >>>> Since everyone is doing it, is there a strong reason why AVRO is >>> > >>>> liked >>> > >>>> so much? And why everyone continues doing this 3 step pattern rather >>> > >>>> than >>> > >>>> providing users with an option to output json instead and another >>> > >>>> option to >>> > >>>> output one flowfile or multiple (one per record). >>> > >>>> >>> > >>>> thanks >>> > >>>> Boris >> >> >