I wonder if this bug is related to the SO question [1] as well? [1] https://stackoverflow.com/questions/58482448/nifi-validate-record-of-nested-json-set-valid-for-missing-array-field
On Wed, Dec 11, 2019 at 11:18 AM Juan Pablo Gardella < gardellajuanpa...@gmail.com> wrote: > The bug https://issues.apache.org/jira/browse/NIFI-4893 was detected by > myself. Do you have a reproducible flow to validate it? > > On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <emanuel.olive...@fmr.com> > wrote: > >> Oh I see, makes, sense your analysis, but sorry I have done java 20 years >> ago, nowadays im mostly data engineer (oracle db, etl tools, custom >> migrations, snowflake and lately nifi).. so count on me to detect >> opportunities to improve things, but not able to change base code/tests. >> >> >> >> Thanks so much for your time and analysis, lets wait for community to >> step up to do the fix and update/run the unit tests 😊 >> >> >> >> Thanks//Regards, >> >> *Emanuel Oliveira* >> >> Senior Oracle/Data Engineer | CTG | Galway >> TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 *|* who's who >> <http://fidelitycentral.fmr.com/ww/a639704> >> >> >> >> *From:* Mark Payne <marka...@hotmail.com> >> *Sent:* Wednesday 11 December 2019 15:25 >> *To:* users@nifi.apache.org >> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory >> ARRAY ? >> >> >> >> *This email is from an external source - **exercise caution regarding >> links and attachments. * >> >> >> >> Emanuel, >> >> >> >> I looked into this a week or so ago, but haven't had a chance to resolve >> the issue yet. It does appear to be a bug. Specifically, I believe the bug >> is here [1]. When we create a RecordSchema from the Avro Schema, we set >> the default value for the array to an empty array, instead of null. Because >> of this, when the JSON is parsed, we end up creating a Record with an empty >> array for the "Record" field instead of a null. As as result, the Record is >> considered valid because it does have an array (it's just empty). I think >> it *should* be a null value instead. >> >> >> >> It looks like this was introduced in NIFI-4893 [2]. We can easily change >> it to just return a null value for the default, but that does result in two >> of the unit tests added in NIFI-4893 failing. It may be that those unit >> tests need to be fixed, or it may be that such a change does break >> something. I just haven't had a chance yet to dig that far into it. >> >> >> >> If you're someone who is comfortable digging into the code and making the >> updates, then please do and I'm happy to review a PR as soon as I'm able. >> >> >> >> Thanks >> >> -Mark >> >> >> >> >> >> [1] >> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631 >> >> >> >> [2] https://issues.apache.org/jira/browse/NIFI-4893 >> >> >> >> >> >> >> >> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com> >> wrote: >> >> >> >> Anyway knowledgably on avro schemas can please confirm/suggest if this >> inability to invalidate json payload missing array in root when allowing >> extra field-true is normal ? >> >> >> >> There’s 2 options with: >> >> · ValidateRecord.Allow Extra Fields=false à need to supply full >> schema >> >> · ValidateRecord.Allow Extra Fields=true à this is what I been >> testing/want, a way to supply schema with only mandatory fields. >> >> >> >> I want 2 mandatory fields, an array with at least 1 element having >> eventVersion, so minimal json should be: >> >> { (..) >> >> "Records": [{ >> >> "eventVersion": "aaa" >> >> (..) >> >> } >> >> ] >> >> (..) >> >> } >> >> >> >> Problem is ValidateRecord considers FF valid if missing “Records” array >> in the root!!!! >> >> { >> >> "Service": "sssssss", >> >> "Event": "eeeee", >> >> "Time": "2019-11-25T16:21:53.280Z", >> >> "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", >> >> "RequestId": "RRRRRRRRRRRRRRRRRR", >> >> "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", >> >> } >> >> >> >> IF I supply the array “Records” then the schema correctly validates I >> need at least eventVersion on the array element record. >> >> >> >> >> >> So… maybe my question can be tuned to “is it possible on avro schema >> syntax to specify cardinalities like in a db e/r diagram where a relation >> can be one of the following: >> >> 0..n >> >> 1..0 >> >> 1 and only 1 ? >> >> >> >> >> >> Thanks//Regards, >> >> *Emanuel Oliveira* >> >> Senior Oracle/Data Engineer | CTG | Galway >> TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 *|* who's who >> <http://fidelitycentral.fmr.com/ww/a639704> >> >> >> >> *From:* Oliveira, Emanuel <emanuel.olive...@fmr.com> >> *Sent:* Friday 6 December 2019 10:15 >> *To:* users@nifi.apache.org >> *Subject:* RE: NiFi ValidateRecord - unable to handle missing mandatory >> ARRAY ? >> >> >> >> Hi Mark, forgot to share the NiFi version we using: >> >> 1.8.0 >> >> 10/22/2018 23:48:30 EDT >> >> Tagged nifi-1.8.0-RC3 >> >> >> >> >> >> Thanks//Regards, >> >> *Emanuel Oliveira* >> >> Senior Oracle/Data Engineer | CTG | Galway >> TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 *|* who's who >> <http://fidelitycentral.fmr.com/ww/a639704> >> >> >> >> *From:* Emanuel Oliveira <emanu...@gmail.com> >> *Sent:* Thursday 5 December 2019 22:42 >> *To:* users@nifi.apache.org >> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory >> ARRAY ? >> >> >> >> *This email is from an external source - **exercise caution regarding >> links and attachments.* >> >> >> >> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into >> GenerateFlowfile as this is the problem. >> >> >> >> Cheers, >> >> Emanuel >> >> >> >> On Thu 5 Dec 2019, 22:03 Mark Payne, <marka...@hotmail.com> wrote: >> >> Emanuel, >> >> >> >> What version of NiFi are you using? >> >> >> >> I just tested the attached template against the latest, and the FlowFile >> was routed to 'invalid' with the explanation: >> >> >> >> Records in this FlowFile were invalid for the following reasons: The >> following 1 fields were missing: [[0]/Records/eventVersion] >> >> >> >> >> >> >> >> >> >> Thanks >> >> -Mark >> >> >> >> >> >> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com> >> wrote: >> >> >> >> Hi all, >> >> >> >> I been struggling to find a way for ValidateRecord using Avro Schema to >> force mandatory the presence of an array on json payload, problem is if >> array “records” is missing Validate is considering FF valid ☹. >> >> --objective - Mandatory to have "Records array" with at least >> "eventVersion" >> >> - using ValidateRecord > Allow Extra Fields >> >> - problem im facing is nifi dont trigger payload BAD 1 as invalid!! >> >> >> >> How can I make mandatory the Records array ? Is it possible ? >> >> >> >> I know I can eventually use a SplitJson JsonPath Expression=$.Records to >> rid off the ARRAY, and also to fial if array "Records" not present.. But I >> would like to have a clean solution using just avro schema, is this >> possible ? >> >> >> >> >> >> >> >> --OK - payload GOOD >> >> { >> >> "Service": "sssssss", >> >> "Event": "eeeee", >> >> "Time": "2019-11-25T16:21:53.280Z", >> >> "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", >> >> "RequestId": "RRRRRRRRRRRRRRRRRR", >> >> "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", >> >> "Records": [{ >> >> "eventVersion": "aaa" >> >> } >> >> ] >> >> } >> >> >> >> --NOK - payload BAD 1 - missing "Records" array à BUT >> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent >> “invalid” since is not compliant to my avro schema which needs array >> “Records” with element “eventVersion” as 2 mandatory things. >> >> { >> >> "Service": "sssssss", >> >> "Event": "eeeee", >> >> "Time": "2019-11-25T16:21:53.280Z", >> >> "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", >> >> "RequestId": "RRRRRRRRRRRRRRRRRR", >> >> "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", >> >> "RecordsXXX": [{ >> >> "eventVersion": "aaa" >> >> } >> >> ] >> >> } >> >> >> >> --OK - payload BAD 2 - "Records" array present but missing "eventVersion" >> >> { >> >> "Service": "sssssss", >> >> "Event": "eeeee", >> >> "Time": "2019-11-25T16:21:53.280Z", >> >> "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", >> >> "RequestId": "RRRRRRRRRRRRRRRRRR", >> >> "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", >> >> "Records": [{ >> >> "eventVersionXX": "aaa" >> >> } >> >> ] >> >> } >> >> >> >> Its very simple test flow (attachmed the xml template >> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using >> ValidateRecord with JsonReader/Json Writer: >> >> <image001.png> >> >> >> >> >> >> Heres ValidateRecord processor + reader/writer controllers: >> >> - Avro schema with just array “Records” and “eventVersion” as min tag >> on array element. >> - Using Allow Extra Fields true: >> >> >> - So im ok having other fields on the root side by side with the >> array “Records”, and also ok to have extra elements inside each array. >> - FYI: the real use case im trying to validate AWS SQS message (s3 >> trigger) where I will be interested on several fields, but crafted this >> simpler example just to ask if its possible to force array to be >> mandatory >> and with at least 1 element ? >> >> ========================================================== >> >> >> >> --ValidateRecord 1.8.0 >> >> Record Reader JsonTreeReader >> >> Record Writer JsonRecordSetWriter >> >> Record Writer for Invalid Records >> >> Schema Access Strategy Use Reader's Schema >> >> Schema Registry No value set >> >> Schema Name ${schema.name} >> >> Schema Text ${avro.schema} >> >> Allow Extra Fields true >> >> Strict Type Checking true >> >> >> >> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + >> "eventVersion" on each ARRAY element >> >> Schema Access Strategy Use 'Schema Text' Property >> >> Schema Registry >> >> Schema Name ${schema.name} >> >> Schema Version >> >> Schema Branch >> >> Schema Text >> >> { >> >> "name": "MyName", >> >> "type": "record", >> >> "namespace": "aa.bb.cc", >> >> "fields": [{ >> >> "name": "Records", >> >> "type": { >> >> "type": "array", >> >> "items": { >> >> "name": >> "Records_record", >> >> "type": "record", >> >> "fields": [{ >> >> "name": >> "eventVersion", >> >> "type": >> "string" >> >> } >> >> ] >> >> } >> >> } >> >> } >> >> ] >> >> } >> >> Date Format >> >> Time Format >> >> Timestamp Format >> >> >> >> --JsonRecordSetWriter 1.8.0 >> >> Schema Write Strategy Do Not Write Schema >> >> Schema Access Strategy Inherit Record Schema >> >> Schema Registry >> >> Schema Name ${schema.name} >> >> Schema Version >> >> Schema Branch >> >> Schema Text { "name": "eventVersion", "type": >> "string" } >> >> Date Format >> >> Time Format >> >> Timestamp Format >> >> Pretty Print JSON true >> >> Suppress Null Values Never Suppress >> >> Output Grouping Array >> >> >> >> Thanks in advance, >> >> Emanuel Oliveira >> >> >> >> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml> >> >> >> >