Emanuel,

Unfortunately, this is not something that I believe Avro schema supports, 
unfortunately. Avro schema is kept reasonably simple but doesn't provide much 
in the way of validation. It's really intended more to instruct 
serializers/deserializers how to work with the bytes.

I would love to get to the point that we are able to use XML Schemas (XSD) to 
form schemas, because XSD is very rich in their validation capabilities. That's 
a lot of work, though, and we're just not there yet.

Thanks
-Mark


> On Dec 19, 2019, at 9:16 AM, Emanuel Oliveira <emanu...@gmail.com> wrote:
> 
> Just additional thought on this, Im not sure if part of avro schema 
> specification, but would be nice to be able to "inform" on the schema of 
> cardinalities.
> For example by default specified records or fields must exist (cardinality 
> 1..1), but in arrays, would be nice to be able to specify cardinality like:
> - 0..n -- can be empty (in this case either tag array must exist or not tbd ).
> - 1..n  -- at least 1 element needed
> - 1 and only element on the array (ie. [0]).
> 
> Best Regards,
> Emanuel Oliveira
> 
> 
> 
> On Thu, Dec 12, 2019 at 11:23 AM Oliveira, Emanuel <emanuel.olive...@fmr.com 
> <mailto:emanuel.olive...@fmr.com>> wrote:
> Hi Juan and others,
> 
>  
> 
> Attaching reproducible test flow for your convenience.
> 
>  
> 
> Once again objective is to have 2 mandatory things on json:
> 
> 1 array “Records” in the root.
> and each element must have attribute eventVersion.
>  
> 
> Theres 3 generateFlowfiles to test the 3 different scenarios:
> 
> problem | missing array and FF still validates.
> Ok | array “Records” present but missing eventVersion. Invalid as expected.
> Ok | both mandatory things present array “Records” + “eventVersion”.
>  
> 
>  
> 
>  
> 
> Thanks//Regards,
> 
> Emanuel Oliveira
> 
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who 
> <http://fidelitycentral.fmr.com/ww/a639704>  
> 
>  
> 
> From: Juan Pablo Gardella <gardellajuanpa...@gmail.com 
> <mailto:gardellajuanpa...@gmail.com>> 
> Sent: Wednesday 11 December 2019 16:18
> To: users@nifi.apache.org <mailto:users@nifi.apache.org>
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
> 
>  
> 
> This email is from an external source - exercise caution regarding links and 
> attachments.
> 
>  
> 
> The bug https://issues.apache.org/jira/browse/NIFI-4893 
> <https://issues.apache.org/jira/browse/NIFI-4893> was detected by myself. Do 
> you have a reproducible flow to validate it?
> 
>  
> 
> On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <emanuel.olive...@fmr.com 
> <mailto:emanuel.olive...@fmr.com>> wrote:
> 
> Oh I see, makes, sense your analysis, but sorry I have done java 20 years 
> ago, nowadays im mostly data engineer (oracle db, etl tools, custom 
> migrations, snowflake and lately nifi).. so count on me to detect 
> opportunities to improve things, but not able to change base code/tests.
> 
>  
> 
> Thanks so much for your time and analysis, lets wait for community to step up 
> to do the fix and update/run the unit tests 😊
> 
>  
> 
> Thanks//Regards,
> 
> Emanuel Oliveira
> 
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who 
> <http://fidelitycentral.fmr.com/ww/a639704>  
> 
>  
> 
> From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com>> 
> Sent: Wednesday 11 December 2019 15:25
> To: users@nifi.apache.org <mailto:users@nifi.apache.org>
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
> 
>  
> 
> This email is from an external source - exercise caution regarding links and 
> attachments.
> 
>  
> 
> Emanuel,
> 
>  
> 
> I looked into this a week or so ago, but haven't had a chance to resolve the 
> issue yet. It does appear to be a bug. Specifically, I believe the bug is 
> here [1].  When we create a RecordSchema from the Avro Schema, we set the 
> default value for the array to an empty array, instead of null. Because of 
> this, when the JSON is parsed, we end up creating a Record with an empty 
> array for the "Record" field instead of a null. As as result, the Record is 
> considered valid because it does have an array (it's just empty). I think it 
> *should* be a null value instead.
> 
>  
> 
> It looks like this was introduced in NIFI-4893 [2]. We can easily change it 
> to just return a null value for the default, but that does result in two of 
> the unit tests added in NIFI-4893 failing. It may be that those unit tests 
> need to be fixed, or it may be that such a change does break something. I 
> just haven't had a chance yet to dig that far into it.
> 
>  
> 
> If you're someone who is comfortable digging into the code and making the 
> updates, then please do and I'm happy to review a PR as soon as I'm able. 
> 
>  
> 
> Thanks
> 
> -Mark
> 
>  
> 
>  
> 
> [1] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631
>  
> <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631>
>  
> 
> [2] https://issues.apache.org/jira/browse/NIFI-4893 
> <https://issues.apache.org/jira/browse/NIFI-4893>
>  
> 
>  
> 
>  
> 
> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com 
> <mailto:emanuel.olive...@fmr.com>> wrote:
> 
>  
> 
> Anyway knowledgably on avro schemas can please confirm/suggest if this 
> inability to invalidate json payload missing array in root when allowing  
> extra field-true is normal ?
> 
>  
> 
> There’s 2 options with:
> 
> ·         ValidateRecord.Allow Extra Fields=false à need to supply full schema
> ·         ValidateRecord.Allow Extra Fields=true à this is what I been 
> testing/want, a way to supply schema with only mandatory fields.
>  
> 
> I want 2 mandatory fields, an array with at least 1 element having 
> eventVersion, so minimal json should be:
> 
> { (..)
> 
>    "Records": [{
> 
>          "eventVersion": "aaa"
> 
>          (..)
> 
>       }
> 
>    ]
> 
>    (..)
> 
> }
> 
>  
> 
> Problem is ValidateRecord considers FF valid if missing “Records” array in 
> the root!!!!
> 
> {
> 
>    "Service": "sssssss",
> 
>    "Event": "eeeee",
> 
>    "Time": "2019-11-25T16:21:53.280Z",
> 
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> 
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
> 
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> 
> }
> 
>  
> 
> IF I supply the array “Records” then the schema correctly validates I need at 
> least eventVersion on the array element record.
> 
>  
> 
>  
> 
> So… maybe my question can be tuned to “is it possible on avro schema syntax 
> to specify cardinalities like in a db e/r diagram where a relation can be one 
> of the following:
> 
> 0..n
> 
> 1..0
> 
> 1 and only 1 ?
> 
>  
> 
>  
> 
> Thanks//Regards,
> 
> Emanuel Oliveira
> 
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who 
> <http://fidelitycentral.fmr.com/ww/a639704>  
> 
>  
> 
> From: Oliveira, Emanuel <emanuel.olive...@fmr.com 
> <mailto:emanuel.olive...@fmr.com>> 
> Sent: Friday 6 December 2019 10:15
> To: users@nifi.apache.org <mailto:users@nifi.apache.org>
> Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
> 
>  
> 
> Hi Mark, forgot to share the NiFi version we using:
> 
> 1.8.0
> 
> 10/22/2018 23:48:30 EDT
> 
> Tagged nifi-1.8.0-RC3
> 
>  
> 
>  
> 
> Thanks//Regards,
> 
> Emanuel Oliveira
> 
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who 
> <http://fidelitycentral.fmr.com/ww/a639704>  
> 
>  
> 
> From: Emanuel Oliveira <emanu...@gmail.com <mailto:emanu...@gmail.com>> 
> Sent: Thursday 5 December 2019 22:42
> To: users@nifi.apache.org <mailto:users@nifi.apache.org>
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
> 
>  
> 
> This email is from an external source - exercise caution regarding links and 
> attachments.
> 
>  
> 
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into 
> GenerateFlowfile as this is the problem.
> 
>  
> 
> Cheers,
> 
> Emanuel 
> 
>  
> 
> On Thu 5 Dec 2019, 22:03 Mark Payne, <marka...@hotmail.com 
> <mailto:marka...@hotmail.com>> wrote:
> 
> Emanuel, 
> 
>  
> 
> What version of NiFi are you using?
> 
>  
> 
> I just tested the attached template against the latest, and the FlowFile was 
> routed to 'invalid' with the explanation:
> 
>  
> 
> Records in this FlowFile were invalid for the following reasons: The 
> following 1 fields were missing: [[0]/Records/eventVersion]
> 
>  
> 
>  
> 
>  
> 
>  
> 
> Thanks
> 
> -Mark
> 
>  
> 
>  
> 
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com 
> <mailto:emanuel.olive...@fmr.com>> wrote:
> 
>  
> 
> Hi all,
> 
>  
> 
> I been struggling to find a way for ValidateRecord using Avro Schema to force 
> mandatory the presence of an array on json payload, problem is if array 
> “records” is missing Validate is considering FF valid ☹.
> 
> --objective - Mandatory to have "Records array" with at least "eventVersion"
> 
> - using ValidateRecord > Allow Extra Fields
> 
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
> 
>  
> 
> How can I make mandatory the Records array ? Is it possible ?
> 
>  
> 
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid 
> off the ARRAY, and also to fial if array "Records" not present.. But I would 
> like to have a clean solution using just avro schema, is this possible ?
> 
>  
> 
>  
> 
>  
> 
> --OK - payload GOOD
> 
> {
> 
>    "Service": "sssssss",
> 
>    "Event": "eeeee",
> 
>    "Time": "2019-11-25T16:21:53.280Z",
> 
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> 
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
> 
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> 
>    "Records": [{
> 
>          "eventVersion": "aaa"
> 
>       }
> 
>    ]
> 
> }
> 
>  
> 
> --NOK - payload BAD 1 - missing "Records" array à BUT 
> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent 
> “invalid” since is not compliant to my avro schema which needs array 
> “Records” with element “eventVersion” as 2 mandatory things.
> 
> {
> 
>    "Service": "sssssss",
> 
>    "Event": "eeeee",
> 
>    "Time": "2019-11-25T16:21:53.280Z",
> 
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> 
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
> 
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> 
>    "RecordsXXX": [{
> 
>          "eventVersion": "aaa"
> 
>       }
> 
>    ]
> 
> }
> 
>  
> 
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> 
> {
> 
>    "Service": "sssssss",
> 
>    "Event": "eeeee",
> 
>    "Time": "2019-11-25T16:21:53.280Z",
> 
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> 
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
> 
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> 
>    "Records": [{
> 
>          "eventVersionXX": "aaa"
> 
>       }
> 
>    ]
> 
> }
> 
>  
> 
> Its very simple test flow (attachmed the xml template 
> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using 
> ValidateRecord with JsonReader/Json Writer:
> 
> <image001.png>
> 
>  
> 
>  
> 
> Heres ValidateRecord processor + reader/writer controllers:
> 
> Avro schema with just array “Records” and “eventVersion” as min tag on array 
> element.
> Using Allow Extra Fields true:
> So im ok having other fields on the root side by side with the array 
> “Records”, and also ok to have extra elements inside each array.
> FYI: the real use case im trying to validate AWS SQS message (s3 trigger) 
> where I will be interested on several fields, but crafted this simpler 
> example just to ask if its possible to force array to be mandatory and with 
> at least 1 element ?
> ==========================================================
> 
>  
> 
> --ValidateRecord 1.8.0
> 
> Record Reader                           JsonTreeReader
> 
> Record Writer                           JsonRecordSetWriter
> 
> Record Writer for Invalid Records      
> 
> Schema Access Strategy                  Use Reader's Schema
> 
> Schema Registry                         No value set
> 
> Schema Name                             ${schema.name <http://schema.name/>}
> 
> Schema Text                             ${avro.schema}
> 
> Allow Extra Fields                      true
> 
> Strict Type Checking                    true
> 
>  
> 
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" 
> on each ARRAY element
> 
> Schema Access Strategy                  Use 'Schema Text' Property
> 
> Schema Registry                        
> 
> Schema Name                             ${schema.name <http://schema.name/>}
> 
> Schema Version                         
> 
> Schema Branch                          
> 
> Schema Text                            
> 
>                                         {
> 
>                                            "name": "MyName",
> 
>                                            "type": "record",
> 
>                                            "namespace": "aa.bb.cc 
> <http://aa.bb.cc/>",
> 
>                                            "fields": [{
> 
>                                                  "name": "Records",
> 
>                                                  "type": {
> 
>                                                     "type": "array",
> 
>                                                     "items": {
> 
>                                                        "name": 
> "Records_record",
> 
>                                                        "type": "record",
> 
>                                                        "fields": [{
> 
>                                                              "name": 
> "eventVersion",
> 
>                                                              "type": "string"
> 
>                                                           }
> 
>                                                        ]
> 
>                                                     }
> 
>                                                  }
> 
>                                               }
> 
>                                            ]
> 
>                                         }
> 
> Date Format                            
> 
> Time Format
> 
> Timestamp Format
> 
>  
> 
> --JsonRecordSetWriter 1.8.0
> 
> Schema Write Strategy                   Do Not Write Schema
> 
> Schema Access Strategy                  Inherit Record Schema
> 
> Schema Registry                        
> 
> Schema Name                             ${schema.name <http://schema.name/>}
> 
> Schema Version
> 
> Schema Branch
> 
> Schema Text                             { "name": "eventVersion", "type": 
> "string" }
> 
> Date Format
> 
> Time Format
> 
> Timestamp Format
> 
> Pretty Print JSON                       true
> 
> Suppress Null Values                    Never Suppress
> 
> Output Grouping                         Array
> 
>  
> 
> Thanks in advance,
> 
> Emanuel Oliveira
> 
>  
> 
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
> 
>  
> 

Reply via email to