Emanuel,

I looked into this a week or so ago, but haven't had a chance to resolve the 
issue yet. It does appear to be a bug. Specifically, I believe the bug is here 
[1].  When we create a RecordSchema from the Avro Schema, we set the default 
value for the array to an empty array, instead of null. Because of this, when 
the JSON is parsed, we end up creating a Record with an empty array for the 
"Record" field instead of a null. As as result, the Record is considered valid 
because it does have an array (it's just empty). I think it *should* be a null 
value instead.

It looks like this was introduced in NIFI-4893 [2]. We can easily change it to 
just return a null value for the default, but that does result in two of the 
unit tests added in NIFI-4893 failing. It may be that those unit tests need to 
be fixed, or it may be that such a change does break something. I just haven't 
had a chance yet to dig that far into it.

If you're someone who is comfortable digging into the code and making the 
updates, then please do and I'm happy to review a PR as soon as I'm able. 

Thanks
-Mark


[1] 
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631
 
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631>

[2] https://issues.apache.org/jira/browse/NIFI-4893 
<https://issues.apache.org/jira/browse/NIFI-4893>



> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com> 
> wrote:
> 
> Anyway knowledgably on avro schemas can please confirm/suggest if this 
> inability to invalidate json payload missing array in root when allowing  
> extra field-true is normal ?
>  
> There’s 2 options with:
> ValidateRecord.Allow Extra Fields=false à need to supply full schema
> ValidateRecord.Allow Extra Fields=true à this is what I been testing/want, a 
> way to supply schema with only mandatory fields.
>  
> I want 2 mandatory fields, an array with at least 1 element having 
> eventVersion, so minimal json should be:
> { (..)
>    "Records": [{
>          "eventVersion": "aaa"
>          (..)
>       }
>    ]
>    (..)
> }
>  
> Problem is ValidateRecord considers FF valid if missing “Records” array in 
> the root!!!!
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> }
>  
> IF I supply the array “Records” then the schema correctly validates I need at 
> least eventVersion on the array element record.
>  
>  
> So… maybe my question can be tuned to “is it possible on avro schema syntax 
> to specify cardinalities like in a db e/r diagram where a relation can be one 
> of the following:
> 0..n
> 1..0
> 1 and only 1 ?
>  
>  
> Thanks//Regards,
> Emanuel Oliveira
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who 
> <http://fidelitycentral.fmr.com/ww/a639704>  
>  
> From: Oliveira, Emanuel <emanuel.olive...@fmr.com> 
> Sent: Friday 6 December 2019 10:15
> To: users@nifi.apache.org
> Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
>  
> Hi Mark, forgot to share the NiFi version we using:
> 1.8.0
> 10/22/2018 23:48:30 EDT
> Tagged nifi-1.8.0-RC3
>  
>  
> Thanks//Regards,
> Emanuel Oliveira
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who 
> <http://fidelitycentral.fmr.com/ww/a639704>  
>  
> From: Emanuel Oliveira <emanu...@gmail.com <mailto:emanu...@gmail.com>> 
> Sent: Thursday 5 December 2019 22:42
> To: users@nifi.apache.org <mailto:users@nifi.apache.org>
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
>  
> This email is from an external source - exercise caution regarding links and 
> attachments.
>  
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into 
> GenerateFlowfile as this is the problem.
>  
> Cheers,
> Emanuel 
>  
> On Thu 5 Dec 2019, 22:03 Mark Payne, <marka...@hotmail.com 
> <mailto:marka...@hotmail.com>> wrote:
> Emanuel, 
>  
> What version of NiFi are you using?
>  
> I just tested the attached template against the latest, and the FlowFile was 
> routed to 'invalid' with the explanation:
>  
> Records in this FlowFile were invalid for the following reasons: The 
> following 1 fields were missing: [[0]/Records/eventVersion]
>  
>  
>  
>  
> Thanks
> -Mark
>  
>  
> 
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com 
> <mailto:emanuel.olive...@fmr.com>> wrote:
>  
> Hi all,
>  
> I been struggling to find a way for ValidateRecord using Avro Schema to force 
> mandatory the presence of an array on json payload, problem is if array 
> “records” is missing Validate is considering FF valid ☹.
> --objective - Mandatory to have "Records array" with at least "eventVersion"
> - using ValidateRecord > Allow Extra Fields
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>  
> How can I make mandatory the Records array ? Is it possible ?
>  
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid 
> off the ARRAY, and also to fial if array "Records" not present.. But I would 
> like to have a clean solution using just avro schema, is this possible ?
>  
>  
>  
> --OK - payload GOOD
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>  
> --NOK - payload BAD 1 - missing "Records" array à BUT 
> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent 
> “invalid” since is not compliant to my avro schema which needs array 
> “Records” with element “eventVersion” as 2 mandatory things.
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "RecordsXXX": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>  
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersionXX": "aaa"
>       }
>    ]
> }
>  
> Its very simple test flow (attachmed the xml template 
> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using 
> ValidateRecord with JsonReader/Json Writer:
> <image001.png>
>  
>  
> Heres ValidateRecord processor + reader/writer controllers:
> Avro schema with just array “Records” and “eventVersion” as min tag on array 
> element.
> Using Allow Extra Fields true:
> So im ok having other fields on the root side by side with the array 
> “Records”, and also ok to have extra elements inside each array.
> FYI: the real use case im trying to validate AWS SQS message (s3 trigger) 
> where I will be interested on several fields, but crafted this simpler 
> example just to ask if its possible to force array to be mandatory and with 
> at least 1 element ?
> ==========================================================
>  
> --ValidateRecord 1.8.0
> Record Reader                           JsonTreeReader
> Record Writer                           JsonRecordSetWriter
> Record Writer for Invalid Records      
> Schema Access Strategy                  Use Reader's Schema
> Schema Registry                         No value set
> Schema Name                             ${schema.name <http://schema.name/>}
> Schema Text                             ${avro.schema}
> Allow Extra Fields                      true
> Strict Type Checking                    true
>  
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" 
> on each ARRAY element
> Schema Access Strategy                  Use 'Schema Text' Property
> Schema Registry                        
> Schema Name                             ${schema.name <http://schema.name/>}
> Schema Version                         
> Schema Branch                          
> Schema Text                            
>                                         {
>                                            "name": "MyName",
>                                            "type": "record",
>                                            "namespace": "aa.bb.cc 
> <http://aa.bb.cc/>",
>                                            "fields": [{
>                                                  "name": "Records",
>                                                  "type": {
>                                                     "type": "array",
>                                                     "items": {
>                                                        "name": 
> "Records_record",
>                                                        "type": "record",
>                                                        "fields": [{
>                                                              "name": 
> "eventVersion",
>                                                              "type": "string"
>                                                           }
>                                                        ]
>                                                     }
>                                                  }
>                                               }
>                                            ]
>                                         }
> Date Format                            
> Time Format
> Timestamp Format
>  
> --JsonRecordSetWriter 1.8.0
> Schema Write Strategy                   Do Not Write Schema
> Schema Access Strategy                  Inherit Record Schema
> Schema Registry                        
> Schema Name                             ${schema.name <http://schema.name/>}
> Schema Version
> Schema Branch
> Schema Text                             { "name": "eventVersion", "type": 
> "string" }
> Date Format
> Time Format
> Timestamp Format
> Pretty Print JSON                       true
> Suppress Null Values                    Never Suppress
> Output Grouping                         Array
>  
> Thanks in advance,
> Emanuel Oliveira
>  
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>

Reply via email to