That's the challenge, the values can be null but I want to know the fields are 
missing(aka not enough delimiters). I run into a common scenario where line 
feeds end up in the data making a short row. Currently the reader just ignores 
the fact that there aren't enough delimiters and makes them null.

On 1/6/20, 3:50 PM, "Matt Burgess" <mattyb...@apache.org> wrote:

    Shawn,
    
    Your schema indicates that the fields are optional because of the
    "type" :  ["null", "string"] , so IIRC they won't be marked as invalid
    because they are treated as null (I'm not sure there's a difference in
    the code between missing and null fields).
    
    You can try "type": "string" in ValidateRecord to see if that fixes
    it, or there's a "StrNotNullOrEmpty" operator in ValidateCSV.
    
    Regards,
    Matt
    
    On Mon, Jan 6, 2020 at 4:35 PM Shawn Weeks <swe...@weeksconsulting.us> 
wrote:
    >
    > I’m trying to validate that a csv file has the number of fields defined 
in it’s Avro schema. Consider the following schema and CSVs. I would like to be 
able to reject the invalid csv as missing fields.
    >
    >
    >
    > {
    >
    >    "type" : "record",
    >
    >    "namespace" : "nifi",
    >
    >    "name" : "nifi",
    >
    >    "fields" : [
    >
    >       { "name" : "c1" , "type" :  ["null", "string"] },
    >
    >       { "name" : "c2" , "type" : ["null", "string"] },
    >
    >       { "name" : "c3" , "type" : ["null", "string"] }
    >
    >    ]
    >
    > }
    >
    >
    >
    > Good CSV
    >
    > c1,c2,c3
    >
    > hello,world,1
    >
    > hello,world,
    >
    > hello,,
    >
    >
    >
    > Bad CSV
    >
    > c1,c2,c3
    >
    > hello,world,1
    >
    > hello,world
    >
    > hello
    >
    >
    

Reply via email to