Boris, Yeah, you can fork either his branch or his entire repo and try it out. Also, usual caveat: user beware until it passes code review...
Mike On Mon, Aug 13, 2018 at 8:36 AM Boris Tyukin <bo...@boristyukin.com> wrote: > Matt, you are awesome! 15 files changes and 3k lines of code - man, do not > tell me you did that in just a few days :) > > since it has not been merged yet with the master, can I just use your > personal branch to compile entire nifi? or is it better to cherry pick your > commit into master? I would like to try it out > > Boris > > On Fri, Aug 10, 2018 at 4:55 PM Matt Burgess <mattyb...@apache.org> wrote: > >> Boris et al, >> >> I put up a PR [1] to add ExecuteSQLRecord and QueryDatabaseTableRecord >> under NIFI-4517, in case anyone wants to play around with it :) >> >> Regards, >> Matt >> >> [1] https://github.com/apache/nifi/pull/2945 >> On Tue, Aug 7, 2018 at 8:30 PM Boris Tyukin <bo...@boristyukin.com> >> wrote: >> > >> > Matt, you rock!! thank you!! >> > >> > On Tue, Aug 7, 2018 at 5:16 PM Matt Burgess <mattyb...@gmail.com> >> wrote: >> >> >> >> Sounds good, it makes the underlying code a bit more complicated but I >> see from y’all’s points that a “separate” processor is a better user >> experience. I’m knee deep in it as we speak, hope to have a PR up in a few >> days. >> >> >> >> Thanks, >> >> Matt >> >> >> >> >> >> On Aug 7, 2018, at 5:07 PM, Andrew Grande <apere...@gmail.com> wrote: >> >> >> >> I'd really like to see the Record suffix on the processor for >> discoverability, as already mentioned. >> >> >> >> Andrew >> >> >> >> On Tue, Aug 7, 2018, 2:16 PM Matt Burgess <mattyb...@apache.org> >> wrote: >> >>> >> >>> Yeah that's definitely doable, most of the logic for writing a >> >>> ResultSet to a Flow File is localized (currently to JdbcCommon but >> >>> also in ResultSetRecordSet), so I wouldn't think it would be too much >> >>> refactor. What are folks thoughts on whether to add a Record Writer >> >>> property to the existing ExecuteSQL or subclass it to a new processor >> >>> called ExecuteSQLRecord? The former is more consistent with how the >> >>> SiteToSite reporting tasks work, but this is a processor. The latter >> >>> is more consistent with the way we've done other record processors, >> >>> and the benefit there is that we don't have to add a bunch of >> >>> documentation to fields that will be ignored (such as the Use Avro >> >>> Logical Types property which we wouldn't need in a ExecuteSQLRecord). >> >>> Having said that, we will want to offer the same options in the Avro >> >>> Reader/Writer, but Peter is working on that under NIFI-5405 [1]. >> >>> >> >>> Thanks, >> >>> Matt >> >>> >> >>> [1] https://issues.apache.org/jira/browse/NIFI-5405 >> >>> >> >>> On Tue, Aug 7, 2018 at 2:06 PM Andy LoPresto <alopre...@apache.org> >> wrote: >> >>> > >> >>> > Matt, >> >>> > >> >>> > Would extending the core ExecuteSQL processor with an >> ExecuteSQLRecord processor also work? I wonder about discoverability if >> only one processor is present and in other places we explicitly name the >> processors which handle records as such. If the ExecuteSQL processor >> handled all the SQL logic, and the ExecuteSQLRecord processor just >> delegated most of the processing in its #onTrigger() method to super, do >> you foresee any substantial difficulties? It might require some refactoring >> of the parent #onTrigger() to service methods. >> >>> > >> >>> > >> >>> > Andy LoPresto >> >>> > alopre...@apache.org >> >>> > alopresto.apa...@gmail.com >> >>> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >>> > >> >>> > On Aug 7, 2018, at 10:25 AM, Andrew Grande <apere...@gmail.com> >> wrote: >> >>> > >> >>> > As a side note, one has to ha e a serious justification _not_ to >> use record-based processors. The benefits, including performance, are too >> numerous to call out here. >> >>> > >> >>> > Andrew >> >>> > >> >>> > On Tue, Aug 7, 2018, 1:15 PM Mark Payne <marka...@hotmail.com> >> wrote: >> >>> >> >> >>> >> Boris, >> >>> >> >> >>> >> Using a Record-based processor does not mean that you need to >> define a schema upfront. This is >> >>> >> necessary if the source itself cannot provide a schema. However, >> since it is pulling structured data >> >>> >> and the schema can be inferred from the database, you wouldn't >> need to. As Matt was saying, your >> >>> >> Record Writer can simply be configured to Inherit Record Schema. >> It can then write the schema to >> >>> >> the "avro.schema" attribute or you can choose "Do Not Write >> Schema". This would still allow the data >> >>> >> to be written in JSON, CSV, etc. >> >>> >> >> >>> >> You could also have the Record Writer choose to write the schema >> using the "avro.schema" attribute, >> >>> >> as mentioned above, and then have any down-stream processors read >> the schema from this attribute. >> >>> >> This would allow you to use any record-oriented processors you'd >> like without having to define the >> >>> >> schema yourself, if you don't want to. >> >>> >> >> >>> >> Thanks >> >>> >> -Mark >> >>> >> >> >>> >> >> >>> >> >> >>> >> On Aug 7, 2018, at 12:37 PM, Boris Tyukin <bo...@boristyukin.com> >> wrote: >> >>> >> >> >>> >> thanks for all the responses! it means I am not the only one >> interested in this topic. >> >>> >> >> >>> >> Record-aware version would be really nice, but a lot of times I do >> not want to use record-based processors since I need to define a schema for >> input/output upfront and just want to run SQL query and get whatever >> results back. It just adds an extra step that will be subject to >> break/support. >> >>> >> >> >>> >> Similar to Kafka processors, it is nice to have an option of >> record-based processor vs. message oriented processor. But if one processor >> can do it all, it is even better :) >> >>> >> >> >>> >> >> >>> >> On Tue, Aug 7, 2018 at 9:28 AM Matt Burgess <mattyb...@apache.org> >> wrote: >> >>> >>> >> >>> >>> I'm definitely interested in supporting a record-aware version as >> well >> >>> >>> (I wrote the Jira up last year [1] but haven't gotten around to >> >>> >>> implementing it), however I agree with Peter's comment on the >> Jira. >> >>> >>> Since ExecuteSQL is an oft-touched processor, if we had two >> processors >> >>> >>> that only differed in how the output is formatted, it could be >> harder >> >>> >>> to maintain (bugs to be fixed in two places, e.g.). I think we >> should >> >>> >>> add an optional RecordWriter property to ExecuteSQL, and the >> >>> >>> documentation would reflect that if it is not set, the output >> will be >> >>> >>> Avro with embedded schema as it has always been. If the >> RecordWriter >> >>> >>> is set, either the schema can be hardcoded, or they can use >> "Inherit >> >>> >>> Record Schema" even though there's no reader, and that would >> mimic the >> >>> >>> current behavior where the schema is inferred from the database >> >>> >>> columns and used for the writer. There is precedence for this >> pattern >> >>> >>> in the SiteToSite reporting tasks. >> >>> >>> >> >>> >>> To Bryan's point about history, Avro at the time was the most >> >>> >>> descriptive of the solutions because it maintains the schema and >> >>> >>> datatypes with the data, unlike JSON, CSV, etc. Also before the >> record >> >>> >>> readers/writers, as Bryan said, you pretty much had to split, >> >>> >>> transform, merge. We just need to make that processor (and others >> with >> >>> >>> specific input/output formats) "record-aware" for better >> performance. >> >>> >>> >> >>> >>> Regards, >> >>> >>> Matt >> >>> >>> >> >>> >>> [1] https://issues.apache.org/jira/browse/NIFI-4517 >> >>> >>> On Tue, Aug 7, 2018 at 9:20 AM Bryan Bende <bbe...@gmail.com> >> wrote: >> >>> >>> > >> >>> >>> > I would also add that the pattern of splitting to 1 record per >> flow >> >>> >>> > file was common before the record processors existed, and >> generally >> >>> >>> > this can/should be avoided now in favor of >> processing/manipulating >> >>> >>> > records in place, and keeping them together in large batches. >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > On Tue, Aug 7, 2018 at 9:10 AM, Andrew Grande < >> apere...@gmail.com> wrote: >> >>> >>> > > Careful, that makes too much sense, Joe ;) >> >>> >>> > > >> >>> >>> > > >> >>> >>> > > On Tue, Aug 7, 2018, 8:45 AM Joe Witt <joe.w...@gmail.com> >> wrote: >> >>> >>> > >> >> >>> >>> > >> i think we just need to make an ExecuteSqlRecord processor. >> >>> >>> > >> >> >>> >>> > >> thanks >> >>> >>> > >> >> >>> >>> > >> On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen < >> mikerthom...@gmail.com> wrote: >> >>> >>> > >>> >> >>> >>> > >>> My guess is that it is due to the fact that Avro is the >> only record type >> >>> >>> > >>> that can match sql pretty closely feature to feature on >> data types. >> >>> >>> > >>> On Tue, Aug 7, 2018 at 8:33 AM Boris Tyukin < >> bo...@boristyukin.com> >> >>> >>> > >>> wrote: >> >>> >>> > >>>> >> >>> >>> > >>>> I've been wondering since I started learning NiFi why >> ExecuteSQL >> >>> >>> > >>>> processor only returns AVRO formatted data. All community >> examples I've seen >> >>> >>> > >>>> then convert AVRO to json and pretty much all of them then >> split json to >> >>> >>> > >>>> multiple flows. >> >>> >>> > >>>> >> >>> >>> > >>>> I found myself doing the same thing over and over and over >> again. >> >>> >>> > >>>> >> >>> >>> > >>>> Since everyone is doing it, is there a strong reason why >> AVRO is liked >> >>> >>> > >>>> so much? And why everyone continues doing this 3 step >> pattern rather than >> >>> >>> > >>>> providing users with an option to output json instead and >> another option to >> >>> >>> > >>>> output one flowfile or multiple (one per record). >> >>> >>> > >>>> >> >>> >>> > >>>> thanks >> >>> >>> > >>>> Boris >> >>> >> >> >>> >> >> >>> > >> >