I wonder if this bug is related to the SO question [1] as well?

[1]
https://stackoverflow.com/questions/58482448/nifi-validate-record-of-nested-json-set-valid-for-missing-array-field


On Wed, Dec 11, 2019 at 11:18 AM Juan Pablo Gardella <
gardellajuanpa...@gmail.com> wrote:

> The bug https://issues.apache.org/jira/browse/NIFI-4893 was detected by
> myself. Do you have a reproducible flow to validate it?
>
> On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <emanuel.olive...@fmr.com>
> wrote:
>
>> Oh I see, makes, sense your analysis, but sorry I have done java 20 years
>> ago, nowadays im mostly data engineer (oracle db, etl tools, custom
>> migrations, snowflake and lately nifi).. so count on me to detect
>> opportunities to improve things, but not able to change base code/tests.
>>
>>
>>
>> Thanks so much for your time and analysis, lets wait for community to
>> step up to do the fix and update/run the unit tests 😊
>>
>>
>>
>> Thanks//Regards,
>>
>> *Emanuel Oliveira*
>>
>> Senior Oracle/Data Engineer | CTG | Galway
>> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
>> <http://fidelitycentral.fmr.com/ww/a639704>
>>
>>
>>
>> *From:* Mark Payne <marka...@hotmail.com>
>> *Sent:* Wednesday 11 December 2019 15:25
>> *To:* users@nifi.apache.org
>> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>>
>>
>>
>> *This email is from an external source - **exercise caution regarding
>> links and attachments. *
>>
>>
>>
>> Emanuel,
>>
>>
>>
>> I looked into this a week or so ago, but haven't had a chance to resolve
>> the issue yet. It does appear to be a bug. Specifically, I believe the bug
>> is here [1].  When we create a RecordSchema from the Avro Schema, we set
>> the default value for the array to an empty array, instead of null. Because
>> of this, when the JSON is parsed, we end up creating a Record with an empty
>> array for the "Record" field instead of a null. As as result, the Record is
>> considered valid because it does have an array (it's just empty). I think
>> it *should* be a null value instead.
>>
>>
>>
>> It looks like this was introduced in NIFI-4893 [2]. We can easily change
>> it to just return a null value for the default, but that does result in two
>> of the unit tests added in NIFI-4893 failing. It may be that those unit
>> tests need to be fixed, or it may be that such a change does break
>> something. I just haven't had a chance yet to dig that far into it.
>>
>>
>>
>> If you're someone who is comfortable digging into the code and making the
>> updates, then please do and I'm happy to review a PR as soon as I'm able.
>>
>>
>>
>> Thanks
>>
>> -Mark
>>
>>
>>
>>
>>
>> [1]
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631
>>
>>
>>
>> [2] https://issues.apache.org/jira/browse/NIFI-4893
>>
>>
>>
>>
>>
>>
>>
>> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com>
>> wrote:
>>
>>
>>
>> Anyway knowledgably on avro schemas can please confirm/suggest if this
>> inability to invalidate json payload missing array in root when allowing
>> extra field-true is normal ?
>>
>>
>>
>> There’s 2 options with:
>>
>> ·         ValidateRecord.Allow Extra Fields=false à need to supply full
>> schema
>>
>> ·         ValidateRecord.Allow Extra Fields=true à this is what I been
>> testing/want, a way to supply schema with only mandatory fields.
>>
>>
>>
>> I want 2 mandatory fields, an array with at least 1 element having
>> eventVersion, so minimal json should be:
>>
>> { (..)
>>
>>    "Records": [{
>>
>>          "eventVersion": "aaa"
>>
>>          (..)
>>
>>       }
>>
>>    ]
>>
>>    (..)
>>
>> }
>>
>>
>>
>> Problem is ValidateRecord considers FF valid if missing “Records” array
>> in the root!!!!
>>
>> {
>>
>>    "Service": "sssssss",
>>
>>    "Event": "eeeee",
>>
>>    "Time": "2019-11-25T16:21:53.280Z",
>>
>>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>>
>>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>>
>>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>>
>> }
>>
>>
>>
>> IF I supply the array “Records” then the schema correctly validates I
>> need at least eventVersion on the array element record.
>>
>>
>>
>>
>>
>> So… maybe my question can be tuned to “is it possible on avro schema
>> syntax to specify cardinalities like in a db e/r diagram where a relation
>> can be one of the following:
>>
>> 0..n
>>
>> 1..0
>>
>> 1 and only 1 ?
>>
>>
>>
>>
>>
>> Thanks//Regards,
>>
>> *Emanuel Oliveira*
>>
>> Senior Oracle/Data Engineer | CTG | Galway
>> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
>> <http://fidelitycentral.fmr.com/ww/a639704>
>>
>>
>>
>> *From:* Oliveira, Emanuel <emanuel.olive...@fmr.com>
>> *Sent:* Friday 6 December 2019 10:15
>> *To:* users@nifi.apache.org
>> *Subject:* RE: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>>
>>
>>
>> Hi Mark, forgot to share the NiFi version we using:
>>
>> 1.8.0
>>
>> 10/22/2018 23:48:30 EDT
>>
>> Tagged nifi-1.8.0-RC3
>>
>>
>>
>>
>>
>> Thanks//Regards,
>>
>> *Emanuel Oliveira*
>>
>> Senior Oracle/Data Engineer | CTG | Galway
>> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
>> <http://fidelitycentral.fmr.com/ww/a639704>
>>
>>
>>
>> *From:* Emanuel Oliveira <emanu...@gmail.com>
>> *Sent:* Thursday 5 December 2019 22:42
>> *To:* users@nifi.apache.org
>> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>>
>>
>>
>> *This email is from an external source - **exercise caution regarding
>> links and attachments.*
>>
>>
>>
>> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into
>> GenerateFlowfile as this is the problem.
>>
>>
>>
>> Cheers,
>>
>> Emanuel
>>
>>
>>
>> On Thu 5 Dec 2019, 22:03 Mark Payne, <marka...@hotmail.com> wrote:
>>
>> Emanuel,
>>
>>
>>
>> What version of NiFi are you using?
>>
>>
>>
>> I just tested the attached template against the latest, and the FlowFile
>> was routed to 'invalid' with the explanation:
>>
>>
>>
>> Records in this FlowFile were invalid for the following reasons: The
>> following 1 fields were missing: [[0]/Records/eventVersion]
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>>
>> -Mark
>>
>>
>>
>>
>>
>> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <emanuel.olive...@fmr.com>
>> wrote:
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I been struggling to find a way for ValidateRecord using Avro Schema to
>> force mandatory the presence of an array on json payload, problem is if
>> array “records” is missing Validate is considering FF valid ☹.
>>
>> --objective - Mandatory to have "Records array" with at least
>> "eventVersion"
>>
>> - using ValidateRecord > Allow Extra Fields
>>
>> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>>
>>
>>
>> How can I make mandatory the Records array ? Is it possible ?
>>
>>
>>
>> I know I can eventually use a SplitJson JsonPath Expression=$.Records to
>> rid off the ARRAY, and also to fial if array "Records" not present.. But I
>> would like to have a clean solution using just avro schema, is this
>> possible ?
>>
>>
>>
>>
>>
>>
>>
>> --OK - payload GOOD
>>
>> {
>>
>>    "Service": "sssssss",
>>
>>    "Event": "eeeee",
>>
>>    "Time": "2019-11-25T16:21:53.280Z",
>>
>>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>>
>>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>>
>>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>>
>>    "Records": [{
>>
>>          "eventVersion": "aaa"
>>
>>       }
>>
>>    ]
>>
>> }
>>
>>
>>
>> --NOK - payload BAD 1 - missing "Records" array à BUT
>> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent
>> “invalid” since is not compliant to my avro schema which needs array
>> “Records” with element “eventVersion” as 2 mandatory things.
>>
>> {
>>
>>    "Service": "sssssss",
>>
>>    "Event": "eeeee",
>>
>>    "Time": "2019-11-25T16:21:53.280Z",
>>
>>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>>
>>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>>
>>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>>
>>    "RecordsXXX": [{
>>
>>          "eventVersion": "aaa"
>>
>>       }
>>
>>    ]
>>
>> }
>>
>>
>>
>> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
>>
>> {
>>
>>    "Service": "sssssss",
>>
>>    "Event": "eeeee",
>>
>>    "Time": "2019-11-25T16:21:53.280Z",
>>
>>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>>
>>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>>
>>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>>
>>    "Records": [{
>>
>>          "eventVersionXX": "aaa"
>>
>>       }
>>
>>    ]
>>
>> }
>>
>>
>>
>> Its very simple test flow (attachmed the xml template
>> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using
>> ValidateRecord with JsonReader/Json Writer:
>>
>> <image001.png>
>>
>>
>>
>>
>>
>> Heres ValidateRecord processor + reader/writer controllers:
>>
>>    - Avro schema with just array “Records” and “eventVersion” as min tag
>>    on array element.
>>    - Using Allow Extra Fields true:
>>
>>
>>    - So im ok having other fields on the root side by side with the
>>       array “Records”, and also ok to have extra elements inside each array.
>>       - FYI: the real use case im trying to validate AWS SQS message (s3
>>       trigger) where I will be interested on several fields, but crafted this
>>       simpler example just to ask if its possible to force array to be 
>> mandatory
>>       and with at least 1 element ?
>>
>> ==========================================================
>>
>>
>>
>> --ValidateRecord 1.8.0
>>
>> Record Reader                           JsonTreeReader
>>
>> Record Writer                           JsonRecordSetWriter
>>
>> Record Writer for Invalid Records
>>
>> Schema Access Strategy                  Use Reader's Schema
>>
>> Schema Registry                         No value set
>>
>> Schema Name                             ${schema.name}
>>
>> Schema Text                             ${avro.schema}
>>
>> Allow Extra Fields                      true
>>
>> Strict Type Checking                    true
>>
>>
>>
>> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY +
>> "eventVersion" on each ARRAY element
>>
>> Schema Access Strategy                  Use 'Schema Text' Property
>>
>> Schema Registry
>>
>> Schema Name                             ${schema.name}
>>
>> Schema Version
>>
>> Schema Branch
>>
>> Schema Text
>>
>>                                         {
>>
>>                                            "name": "MyName",
>>
>>                                            "type": "record",
>>
>>                                            "namespace": "aa.bb.cc",
>>
>>                                            "fields": [{
>>
>>                                                  "name": "Records",
>>
>>                                                  "type": {
>>
>>                                                     "type": "array",
>>
>>                                                     "items": {
>>
>>                                                        "name":
>> "Records_record",
>>
>>                                                        "type": "record",
>>
>>                                                        "fields": [{
>>
>>                                                              "name":
>> "eventVersion",
>>
>>                                                              "type":
>> "string"
>>
>>                                                           }
>>
>>                                                        ]
>>
>>                                                     }
>>
>>                                                  }
>>
>>                                               }
>>
>>                                            ]
>>
>>                                         }
>>
>> Date Format
>>
>> Time Format
>>
>> Timestamp Format
>>
>>
>>
>> --JsonRecordSetWriter 1.8.0
>>
>> Schema Write Strategy                   Do Not Write Schema
>>
>> Schema Access Strategy                  Inherit Record Schema
>>
>> Schema Registry
>>
>> Schema Name                             ${schema.name}
>>
>> Schema Version
>>
>> Schema Branch
>>
>> Schema Text                             { "name": "eventVersion", "type":
>> "string" }
>>
>> Date Format
>>
>> Time Format
>>
>> Timestamp Format
>>
>> Pretty Print JSON                       true
>>
>> Suppress Null Values                    Never Suppress
>>
>> Output Grouping                         Array
>>
>>
>>
>> Thanks in advance,
>>
>> Emanuel Oliveira
>>
>>
>>
>> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>>
>>
>>
>

Reply via email to