Oh I see, makes, sense your analysis, but sorry I have done java 20 years ago, 
nowadays im mostly data engineer (oracle db, etl tools, custom migrations, 
snowflake and lately nifi).. so count on me to detect opportunities to improve 
things, but not able to change base code/tests.

Thanks so much for your time and analysis, lets wait for community to step up 
to do the fix and update/run the unit tests 😊

Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's 
who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Mark Payne <marka...@hotmail.com>
Sent: Wednesday 11 December 2019 15:25
To: users@nifi.apache.org
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and 
attachments.

Emanuel,

I looked into this a week or so ago, but haven't had a chance to resolve the 
issue yet. It does appear to be a bug. Specifically, I believe the bug is here 
[1].  When we create a RecordSchema from the Avro Schema, we set the default 
value for the array to an empty array, instead of null. Because of this, when 
the JSON is parsed, we end up creating a Record with an empty array for the 
"Record" field instead of a null. As as result, the Record is considered valid 
because it does have an array (it's just empty). I think it *should* be a null 
value instead.

It looks like this was introduced in NIFI-4893 [2]. We can easily change it to 
just return a null value for the default, but that does result in two of the 
unit tests added in NIFI-4893 failing. It may be that those unit tests need to 
be fixed, or it may be that such a change does break something. I just haven't 
had a chance yet to dig that far into it.

If you're someone who is comfortable digging into the code and making the 
updates, then please do and I'm happy to review a PR as soon as I'm able.

Thanks
-Mark


[1] 
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631

[2] https://issues.apache.org/jira/browse/NIFI-4893




On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel 
<emanuel.olive...@fmr.com<mailto:emanuel.olive...@fmr.com>> wrote:

Anyway knowledgably on avro schemas can please confirm/suggest if this 
inability to invalidate json payload missing array in root when allowing  extra 
field-true is normal ?

There’s 2 options with:

·         ValidateRecord.Allow Extra Fields=false --> need to supply full schema

·         ValidateRecord.Allow Extra Fields=true --> this is what I been 
testing/want, a way to supply schema with only mandatory fields.

I want 2 mandatory fields, an array with at least 1 element having 
eventVersion, so minimal json should be:
{ (..)
   "Records": [{
         "eventVersion": "aaa"
         (..)
      }
   ]
   (..)
}

Problem is ValidateRecord considers FF valid if missing “Records” array in the 
root!!!!
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
}

IF I supply the array “Records” then the schema correctly validates I need at 
least eventVersion on the array element record.


So… maybe my question can be tuned to “is it possible on avro schema syntax to 
specify cardinalities like in a db e/r diagram where a relation can be one of 
the following:
0..n
1..0
1 and only 1 ?


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's 
who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Oliveira, Emanuel 
<emanuel.olive...@fmr.com<mailto:emanuel.olive...@fmr.com>>
Sent: Friday 6 December 2019 10:15
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Hi Mark, forgot to share the NiFi version we using:
1.8.0
10/22/2018 23:48:30 EDT
Tagged nifi-1.8.0-RC3


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's 
who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Emanuel Oliveira <emanu...@gmail.com<mailto:emanu...@gmail.com>>
Sent: Thursday 5 December 2019 22:42
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and 
attachments.

Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile 
as this is the problem.

Cheers,
Emanuel

On Thu 5 Dec 2019, 22:03 Mark Payne, 
<marka...@hotmail.com<mailto:marka...@hotmail.com>> wrote:
Emanuel,

What version of NiFi are you using?

I just tested the attached template against the latest, and the FlowFile was 
routed to 'invalid' with the explanation:

Records in this FlowFile were invalid for the following reasons: The following 
1 fields were missing: [[0]/Records/eventVersion]




Thanks
-Mark


On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel 
<emanuel.olive...@fmr.com<mailto:emanuel.olive...@fmr.com>> wrote:

Hi all,

I been struggling to find a way for ValidateRecord using Avro Schema to force 
mandatory the presence of an array on json payload, problem is if array 
“records” is missing Validate is considering FF valid ☹.
--objective - Mandatory to have "Records array" with at least "eventVersion"
- using ValidateRecord > Allow Extra Fields
- problem im facing is nifi dont trigger payload BAD 1 as invalid!!

How can I make mandatory the Records array ? Is it possible ?

I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid 
off the ARRAY, and also to fial if array "Records" not present.. But I would 
like to have a clean solution using just avro schema, is this possible ?



--OK - payload GOOD
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersion": "aaa"
      }
   ]
}

--NOK - payload BAD 1 - missing "Records" array --> BUT 
VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” 
since is not compliant to my avro schema which needs array “Records” with 
element “eventVersion” as 2 mandatory things.
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "RecordsXXX": [{
         "eventVersion": "aaa"
      }
   ]
}

--OK - payload BAD 2 - "Records" array present but missing "eventVersion"
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersionXX": "aaa"
      }
   ]
}

Its very simple test flow (attachmed the xml template 
ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using 
ValidateRecord with JsonReader/Json Writer:
<image001.png>


Heres ValidateRecord processor + reader/writer controllers:

  *   Avro schema with just array “Records” and “eventVersion” as min tag on 
array element.
  *   Using Allow Extra Fields true:

     *   So im ok having other fields on the root side by side with the array 
“Records”, and also ok to have extra elements inside each array.
     *   FYI: the real use case im trying to validate AWS SQS message (s3 
trigger) where I will be interested on several fields, but crafted this simpler 
example just to ask if its possible to force array to be mandatory and with at 
least 1 element ?
==========================================================

--ValidateRecord 1.8.0
Record Reader                           JsonTreeReader
Record Writer                           JsonRecordSetWriter
Record Writer for Invalid Records
Schema Access Strategy                  Use Reader's Schema
Schema Registry                         No value set
Schema Name                             ${schema.name<http://schema.name/>}
Schema Text                             ${avro.schema}
Allow Extra Fields                      true
Strict Type Checking                    true

--JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on 
each ARRAY element
Schema Access Strategy                  Use 'Schema Text' Property
Schema Registry
Schema Name                             ${schema.name<http://schema.name/>}
Schema Version
Schema Branch
Schema Text
                                        {
                                           "name": "MyName",
                                           "type": "record",
                                           "namespace": 
"aa.bb.cc<http://aa.bb.cc/>",
                                           "fields": [{
                                                 "name": "Records",
                                                 "type": {
                                                    "type": "array",
                                                    "items": {
                                                       "name": "Records_record",
                                                       "type": "record",
                                                       "fields": [{
                                                             "name": 
"eventVersion",
                                                             "type": "string"
                                                          }
                                                       ]
                                                    }
                                                 }
                                              }
                                           ]
                                        }
Date Format
Time Format
Timestamp Format

--JsonRecordSetWriter 1.8.0
Schema Write Strategy                   Do Not Write Schema
Schema Access Strategy                  Inherit Record Schema
Schema Registry
Schema Name                             ${schema.name<http://schema.name/>}
Schema Version
Schema Branch
Schema Text                             { "name": "eventVersion", "type": 
"string" }
Date Format
Time Format
Timestamp Format
Pretty Print JSON                       true
Suppress Null Values                    Never Suppress
Output Grouping                         Array

Thanks in advance,
Emanuel Oliveira

<ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>

Reply via email to