Oh I see, makes, sense your analysis, but sorry I have done java 20 years ago, nowadays im mostly data engineer (oracle db, etl tools, custom migrations, snowflake and lately nifi).. so count on me to detect opportunities to improve things, but not able to change base code/tests.
Thanks so much for your time and analysis, lets wait for community to step up to do the fix and update/run the unit tests 😊 Thanks//Regards, Emanuel Oliveira Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 | who's who<http://fidelitycentral.fmr.com/ww/a639704> From: Mark Payne <marka...@hotmail.com> Sent: Wednesday 11 December 2019 15:25 To: users@nifi.apache.org Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ? This email is from an external source - exercise caution regarding links and attachments. Emanuel, I looked into this a week or so ago, but haven't had a chance to resolve the issue yet. It does appear to be a bug. Specifically, I believe the bug is here [1]. When we create a RecordSchema from the Avro Schema, we set the default value for the array to an empty array, instead of null. Because of this, when the JSON is parsed, we end up creating a Record with an empty array for the "Record" field instead of a null. As as result, the Record is considered valid because it does have an array (it's just empty). I think it *should* be a null value instead. It looks like this was introduced in NIFI-4893 [2]. We can easily change it to just return a null value for the default, but that does result in two of the unit tests added in NIFI-4893 failing. It may be that those unit tests need to be fixed, or it may be that such a change does break something. I just haven't had a chance yet to dig that far into it. If you're someone who is comfortable digging into the code and making the updates, then please do and I'm happy to review a PR as soon as I'm able. Thanks -Mark [1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631 [2] https://issues.apache.org/jira/browse/NIFI-4893 On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com<mailto:emanuel.olive...@fmr.com>> wrote: Anyway knowledgably on avro schemas can please confirm/suggest if this inability to invalidate json payload missing array in root when allowing extra field-true is normal ? There’s 2 options with: · ValidateRecord.Allow Extra Fields=false --> need to supply full schema · ValidateRecord.Allow Extra Fields=true --> this is what I been testing/want, a way to supply schema with only mandatory fields. I want 2 mandatory fields, an array with at least 1 element having eventVersion, so minimal json should be: { (..) "Records": [{ "eventVersion": "aaa" (..) } ] (..) } Problem is ValidateRecord considers FF valid if missing “Records” array in the root!!!! { "Service": "sssssss", "Event": "eeeee", "Time": "2019-11-25T16:21:53.280Z", "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", "RequestId": "RRRRRRRRRRRRRRRRRR", "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", } IF I supply the array “Records” then the schema correctly validates I need at least eventVersion on the array element record. So… maybe my question can be tuned to “is it possible on avro schema syntax to specify cardinalities like in a db e/r diagram where a relation can be one of the following: 0..n 1..0 1 and only 1 ? Thanks//Regards, Emanuel Oliveira Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 | who's who<http://fidelitycentral.fmr.com/ww/a639704> From: Oliveira, Emanuel <emanuel.olive...@fmr.com<mailto:emanuel.olive...@fmr.com>> Sent: Friday 6 December 2019 10:15 To: users@nifi.apache.org<mailto:users@nifi.apache.org> Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ? Hi Mark, forgot to share the NiFi version we using: 1.8.0 10/22/2018 23:48:30 EDT Tagged nifi-1.8.0-RC3 Thanks//Regards, Emanuel Oliveira Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 | who's who<http://fidelitycentral.fmr.com/ww/a639704> From: Emanuel Oliveira <emanu...@gmail.com<mailto:emanu...@gmail.com>> Sent: Thursday 5 December 2019 22:42 To: users@nifi.apache.org<mailto:users@nifi.apache.org> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ? This email is from an external source - exercise caution regarding links and attachments. Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem. Cheers, Emanuel On Thu 5 Dec 2019, 22:03 Mark Payne, <marka...@hotmail.com<mailto:marka...@hotmail.com>> wrote: Emanuel, What version of NiFi are you using? I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation: Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion] Thanks -Mark On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com<mailto:emanuel.olive...@fmr.com>> wrote: Hi all, I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹. --objective - Mandatory to have "Records array" with at least "eventVersion" - using ValidateRecord > Allow Extra Fields - problem im facing is nifi dont trigger payload BAD 1 as invalid!! How can I make mandatory the Records array ? Is it possible ? I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ? --OK - payload GOOD { "Service": "sssssss", "Event": "eeeee", "Time": "2019-11-25T16:21:53.280Z", "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", "RequestId": "RRRRRRRRRRRRRRRRRR", "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", "Records": [{ "eventVersion": "aaa" } ] } --NOK - payload BAD 1 - missing "Records" array --> BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things. { "Service": "sssssss", "Event": "eeeee", "Time": "2019-11-25T16:21:53.280Z", "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", "RequestId": "RRRRRRRRRRRRRRRRRR", "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", "RecordsXXX": [{ "eventVersion": "aaa" } ] } --OK - payload BAD 2 - "Records" array present but missing "eventVersion" { "Service": "sssssss", "Event": "eeeee", "Time": "2019-11-25T16:21:53.280Z", "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", "RequestId": "RRRRRRRRRRRRRRRRRR", "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", "Records": [{ "eventVersionXX": "aaa" } ] } Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer: <image001.png> Heres ValidateRecord processor + reader/writer controllers: * Avro schema with just array “Records” and “eventVersion” as min tag on array element. * Using Allow Extra Fields true: * So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array. * FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ? ========================================================== --ValidateRecord 1.8.0 Record Reader JsonTreeReader Record Writer JsonRecordSetWriter Record Writer for Invalid Records Schema Access Strategy Use Reader's Schema Schema Registry No value set Schema Name ${schema.name<http://schema.name/>} Schema Text ${avro.schema} Allow Extra Fields true Strict Type Checking true --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element Schema Access Strategy Use 'Schema Text' Property Schema Registry Schema Name ${schema.name<http://schema.name/>} Schema Version Schema Branch Schema Text { "name": "MyName", "type": "record", "namespace": "aa.bb.cc<http://aa.bb.cc/>", "fields": [{ "name": "Records", "type": { "type": "array", "items": { "name": "Records_record", "type": "record", "fields": [{ "name": "eventVersion", "type": "string" } ] } } } ] } Date Format Time Format Timestamp Format --JsonRecordSetWriter 1.8.0 Schema Write Strategy Do Not Write Schema Schema Access Strategy Inherit Record Schema Schema Registry Schema Name ${schema.name<http://schema.name/>} Schema Version Schema Branch Schema Text { "name": "eventVersion", "type": "string" } Date Format Time Format Timestamp Format Pretty Print JSON true Suppress Null Values Never Suppress Output Grouping Array Thanks in advance, Emanuel Oliveira <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>