Emanuel, Unfortunately, this is not something that I believe Avro schema supports, unfortunately. Avro schema is kept reasonably simple but doesn't provide much in the way of validation. It's really intended more to instruct serializers/deserializers how to work with the bytes.
I would love to get to the point that we are able to use XML Schemas (XSD) to form schemas, because XSD is very rich in their validation capabilities. That's a lot of work, though, and we're just not there yet. Thanks -Mark > On Dec 19, 2019, at 9:16 AM, Emanuel Oliveira <emanu...@gmail.com> wrote: > > Just additional thought on this, Im not sure if part of avro schema > specification, but would be nice to be able to "inform" on the schema of > cardinalities. > For example by default specified records or fields must exist (cardinality > 1..1), but in arrays, would be nice to be able to specify cardinality like: > - 0..n -- can be empty (in this case either tag array must exist or not tbd ). > - 1..n -- at least 1 element needed > - 1 and only element on the array (ie. [0]). > > Best Regards, > Emanuel Oliveira > > > > On Thu, Dec 12, 2019 at 11:23 AM Oliveira, Emanuel <emanuel.olive...@fmr.com > <mailto:emanuel.olive...@fmr.com>> wrote: > Hi Juan and others, > > > > Attaching reproducible test flow for your convenience. > > > > Once again objective is to have 2 mandatory things on json: > > 1 array “Records” in the root. > and each element must have attribute eventVersion. > > > Theres 3 generateFlowfiles to test the 3 different scenarios: > > problem | missing array and FF still validates. > Ok | array “Records” present but missing eventVersion. Invalid as expected. > Ok | both mandatory things present array “Records” + “eventVersion”. > > > > > > > Thanks//Regards, > > Emanuel Oliveira > > Senior Oracle/Data Engineer | CTG | Galway > TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 | who's who > <http://fidelitycentral.fmr.com/ww/a639704> > > > > From: Juan Pablo Gardella <gardellajuanpa...@gmail.com > <mailto:gardellajuanpa...@gmail.com>> > Sent: Wednesday 11 December 2019 16:18 > To: users@nifi.apache.org <mailto:users@nifi.apache.org> > Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ? > > > > This email is from an external source - exercise caution regarding links and > attachments. > > > > The bug https://issues.apache.org/jira/browse/NIFI-4893 > <https://issues.apache.org/jira/browse/NIFI-4893> was detected by myself. Do > you have a reproducible flow to validate it? > > > > On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <emanuel.olive...@fmr.com > <mailto:emanuel.olive...@fmr.com>> wrote: > > Oh I see, makes, sense your analysis, but sorry I have done java 20 years > ago, nowadays im mostly data engineer (oracle db, etl tools, custom > migrations, snowflake and lately nifi).. so count on me to detect > opportunities to improve things, but not able to change base code/tests. > > > > Thanks so much for your time and analysis, lets wait for community to step up > to do the fix and update/run the unit tests 😊 > > > > Thanks//Regards, > > Emanuel Oliveira > > Senior Oracle/Data Engineer | CTG | Galway > TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 | who's who > <http://fidelitycentral.fmr.com/ww/a639704> > > > > From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com>> > Sent: Wednesday 11 December 2019 15:25 > To: users@nifi.apache.org <mailto:users@nifi.apache.org> > Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ? > > > > This email is from an external source - exercise caution regarding links and > attachments. > > > > Emanuel, > > > > I looked into this a week or so ago, but haven't had a chance to resolve the > issue yet. It does appear to be a bug. Specifically, I believe the bug is > here [1]. When we create a RecordSchema from the Avro Schema, we set the > default value for the array to an empty array, instead of null. Because of > this, when the JSON is parsed, we end up creating a Record with an empty > array for the "Record" field instead of a null. As as result, the Record is > considered valid because it does have an array (it's just empty). I think it > *should* be a null value instead. > > > > It looks like this was introduced in NIFI-4893 [2]. We can easily change it > to just return a null value for the default, but that does result in two of > the unit tests added in NIFI-4893 failing. It may be that those unit tests > need to be fixed, or it may be that such a change does break something. I > just haven't had a chance yet to dig that far into it. > > > > If you're someone who is comfortable digging into the code and making the > updates, then please do and I'm happy to review a PR as soon as I'm able. > > > > Thanks > > -Mark > > > > > > [1] > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631 > > <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631> > > > [2] https://issues.apache.org/jira/browse/NIFI-4893 > <https://issues.apache.org/jira/browse/NIFI-4893> > > > > > > > On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com > <mailto:emanuel.olive...@fmr.com>> wrote: > > > > Anyway knowledgably on avro schemas can please confirm/suggest if this > inability to invalidate json payload missing array in root when allowing > extra field-true is normal ? > > > > There’s 2 options with: > > · ValidateRecord.Allow Extra Fields=false à need to supply full schema > · ValidateRecord.Allow Extra Fields=true à this is what I been > testing/want, a way to supply schema with only mandatory fields. > > > I want 2 mandatory fields, an array with at least 1 element having > eventVersion, so minimal json should be: > > { (..) > > "Records": [{ > > "eventVersion": "aaa" > > (..) > > } > > ] > > (..) > > } > > > > Problem is ValidateRecord considers FF valid if missing “Records” array in > the root!!!! > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > } > > > > IF I supply the array “Records” then the schema correctly validates I need at > least eventVersion on the array element record. > > > > > > So… maybe my question can be tuned to “is it possible on avro schema syntax > to specify cardinalities like in a db e/r diagram where a relation can be one > of the following: > > 0..n > > 1..0 > > 1 and only 1 ? > > > > > > Thanks//Regards, > > Emanuel Oliveira > > Senior Oracle/Data Engineer | CTG | Galway > TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 | who's who > <http://fidelitycentral.fmr.com/ww/a639704> > > > > From: Oliveira, Emanuel <emanuel.olive...@fmr.com > <mailto:emanuel.olive...@fmr.com>> > Sent: Friday 6 December 2019 10:15 > To: users@nifi.apache.org <mailto:users@nifi.apache.org> > Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ? > > > > Hi Mark, forgot to share the NiFi version we using: > > 1.8.0 > > 10/22/2018 23:48:30 EDT > > Tagged nifi-1.8.0-RC3 > > > > > > Thanks//Regards, > > Emanuel Oliveira > > Senior Oracle/Data Engineer | CTG | Galway > TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 | who's who > <http://fidelitycentral.fmr.com/ww/a639704> > > > > From: Emanuel Oliveira <emanu...@gmail.com <mailto:emanu...@gmail.com>> > Sent: Thursday 5 December 2019 22:42 > To: users@nifi.apache.org <mailto:users@nifi.apache.org> > Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ? > > > > This email is from an external source - exercise caution regarding links and > attachments. > > > > Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into > GenerateFlowfile as this is the problem. > > > > Cheers, > > Emanuel > > > > On Thu 5 Dec 2019, 22:03 Mark Payne, <marka...@hotmail.com > <mailto:marka...@hotmail.com>> wrote: > > Emanuel, > > > > What version of NiFi are you using? > > > > I just tested the attached template against the latest, and the FlowFile was > routed to 'invalid' with the explanation: > > > > Records in this FlowFile were invalid for the following reasons: The > following 1 fields were missing: [[0]/Records/eventVersion] > > > > > > > > > > Thanks > > -Mark > > > > > > On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com > <mailto:emanuel.olive...@fmr.com>> wrote: > > > > Hi all, > > > > I been struggling to find a way for ValidateRecord using Avro Schema to force > mandatory the presence of an array on json payload, problem is if array > “records” is missing Validate is considering FF valid ☹. > > --objective - Mandatory to have "Records array" with at least "eventVersion" > > - using ValidateRecord > Allow Extra Fields > > - problem im facing is nifi dont trigger payload BAD 1 as invalid!! > > > > How can I make mandatory the Records array ? Is it possible ? > > > > I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid > off the ARRAY, and also to fial if array "Records" not present.. But I would > like to have a clean solution using just avro schema, is this possible ? > > > > > > > > --OK - payload GOOD > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "Records": [{ > > "eventVersion": "aaa" > > } > > ] > > } > > > > --NOK - payload BAD 1 - missing "Records" array à BUT > VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent > “invalid” since is not compliant to my avro schema which needs array > “Records” with element “eventVersion” as 2 mandatory things. > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "RecordsXXX": [{ > > "eventVersion": "aaa" > > } > > ] > > } > > > > --OK - payload BAD 2 - "Records" array present but missing "eventVersion" > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "Records": [{ > > "eventVersionXX": "aaa" > > } > > ] > > } > > > > Its very simple test flow (attachmed the xml template > ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using > ValidateRecord with JsonReader/Json Writer: > > <image001.png> > > > > > > Heres ValidateRecord processor + reader/writer controllers: > > Avro schema with just array “Records” and “eventVersion” as min tag on array > element. > Using Allow Extra Fields true: > So im ok having other fields on the root side by side with the array > “Records”, and also ok to have extra elements inside each array. > FYI: the real use case im trying to validate AWS SQS message (s3 trigger) > where I will be interested on several fields, but crafted this simpler > example just to ask if its possible to force array to be mandatory and with > at least 1 element ? > ========================================================== > > > > --ValidateRecord 1.8.0 > > Record Reader JsonTreeReader > > Record Writer JsonRecordSetWriter > > Record Writer for Invalid Records > > Schema Access Strategy Use Reader's Schema > > Schema Registry No value set > > Schema Name ${schema.name <http://schema.name/>} > > Schema Text ${avro.schema} > > Allow Extra Fields true > > Strict Type Checking true > > > > --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" > on each ARRAY element > > Schema Access Strategy Use 'Schema Text' Property > > Schema Registry > > Schema Name ${schema.name <http://schema.name/>} > > Schema Version > > Schema Branch > > Schema Text > > { > > "name": "MyName", > > "type": "record", > > "namespace": "aa.bb.cc > <http://aa.bb.cc/>", > > "fields": [{ > > "name": "Records", > > "type": { > > "type": "array", > > "items": { > > "name": > "Records_record", > > "type": "record", > > "fields": [{ > > "name": > "eventVersion", > > "type": "string" > > } > > ] > > } > > } > > } > > ] > > } > > Date Format > > Time Format > > Timestamp Format > > > > --JsonRecordSetWriter 1.8.0 > > Schema Write Strategy Do Not Write Schema > > Schema Access Strategy Inherit Record Schema > > Schema Registry > > Schema Name ${schema.name <http://schema.name/>} > > Schema Version > > Schema Branch > > Schema Text { "name": "eventVersion", "type": > "string" } > > Date Format > > Time Format > > Timestamp Format > > Pretty Print JSON true > > Suppress Null Values Never Suppress > > Output Grouping Array > > > > Thanks in advance, > > Emanuel Oliveira > > > > <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml> > > >