Re: Issue with QueryRecord failing when data is missing

Matt Burgess Thu, 25 Feb 2021 07:28:33 -0800

Jens,

What is the Schema Access Strategy set to in your CSVReader? If "Infer
Schema" or "Use String Fields From Header", the setting of "Treat
First Line As Header" should be ignored as those two options require a
header be present anyway. If you know the schema ahead of time you
could set it in the CSVReader rather than inferring it.

For "Infer Schema", there's a bug where the inferred schema is empty
because we don't have any records from which to infer the types of the
fields (even though the field names are present). I wrote up NIFI-8259
[1] to infer the types as strings when no records are present.

As a workaround you could filter out any FlowFiles that have no
records, either by using CountText or the 'record.count' attribute if
it has been set, into a RouteOnAttribute. Alternatively you could
emulate what NIFI-8259 is going to do by using "Use String Fields From
Header" in your CSVReader, but in that case you might need a CAST(colC
as BOOLEAN) in your SQL since populated FlowFiles could have the
correctly inferred schema where empty FlowFiles (or if "Use String
Fields From Header" is set) will think colC is a string rather than a
boolean. The CAST should work in both cases but I didn't try it.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-8259

On Thu, Feb 25, 2021 at 1:56 AM Jens M. Kofoed <jmkofoed....@gmail.com> wrote:
>
> Hi all
>
> I have a issue with using the QueryRecord query csv files. currently i'm 
> running NiFi version 1.12.1 but I also tested this in version 1.13.0
> If my incoming csv file only have a header line and no data it fails
>
> My querying statement looks like this: SELECT colA FROM FLOWFILE WHERE colC = 
> 'true'
>
> Changes made to the CSVReader:
> Treat Firs Line as Header = true
>
> Changes made to the CSVRecordSetWriter:
> Include Header Line = false
> Record Separator = ,
>
> Here are 2 sample data. The first one works as expected, but sample 2 gives 
> errors
> Sample 1:
> colA,colB,colC
> data1A,data1B,true
> data2A,data2B,false
> data3A,data3B,true
>
> Outcome: data1A,data3A,
>
> Sample 2:
> colA,colB,colC
>
> Error message:
> QueryRecord[id=d7c38f75-0177-1000-ffff-fffff694dd96] Unable to query 
> StandardFlowFileRecord[uuid=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,claim=StandardContentClaim
>  [resourceClaim=StandardResourceClaim[id=1614232293848-3, 
> container=Node01Cont01, section=3], offset=463, 
> length=14],offset=0,name=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,size=14] due to 
> java.sql.SQLException: Error while preparing statement [SELECT colA FROM 
> FLOWFILE WHERE colC = true]: 
> org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: 
> Error while preparing statement [SELECT colA FROM FLOWFILE WHERE colC = true]
>
> Is this a bug?
>
> kind regards
> Jens M. Kofoed

Re: Issue with QueryRecord failing when data is missing

Reply via email to