Hi,
This is a test pipeline reading pdf files from disk. It begins with a
GetFile processor supplying a ConvertRecord processor with a scripted
reader input and an avrorecordsetwriter, generic output.

The scripted reader places the file content in a "content" field:

 List<RecordField> recordFields = []
recordFields.add(new RecordField("content",
RecordFieldType.ARRAY.getArrayDataType(RecordFieldType.BYTE.getDataType())))
        schema = new SimpleRecordSchema(recordFields)

This bit seems OK.

Next step is update record which adds other fields to mimic the real case
of pulling out of a DB - Age, gender etc, all of which are dummies and a
timestamp based on the filename by the following expression language:

${filename:substringBeforeLast('.'):substringAfterLast('_'):toDate('yyyyMMdd'):format("yyyy-MM-dd
HH:mm:ss")}

If I explicitly set the  schema for the record writer to include
 {"name":"Visit_DateTime","type": {"type" : "long", "logicalType" :
"timestamp-millis"}},

then I can get the following converter, a groovy script, which converts to
json for transmission to an web service, to deal with the dates as follows:

Date VisitTimeValue = null

VisitTimeValue = new Date(currRecord.get(TimeStampFieldName))


I guess I thought this approach was overly complex. Given that I'm using
Date functions in the expression language I hoped that the generic avro
writer would correctly infer the schema so that I didn't have to explicitly
provide one. Is this approach the right one? Is there a way I can isolate
the expectation of a date component inside the groovy file only?

I hope this is clear.
Thanks for your help.


On Thu, Feb 15, 2024 at 9:38 AM Mark Payne <marka...@hotmail.com> wrote:

> Hey Richard,
>
> I think you’d need to explain more about what you’re doing in your groovy
> script. What processor are you using? What’s the script doing? Is it
> parsing Avro data?
>
> On Jan 29, 2024, at 12:26 AM, Richard Beare <richard.be...@gmail.com>
> wrote:
>
> Anyone able to offer assistance with this?
>
> I think my problem relates to correctly specifying types using expression
> languages and using schema inference from groovy.
>
> On Tue, Jan 23, 2024 at 2:20 PM Richard Beare <richard.be...@gmail.com>
> wrote:
>
>> Hi,
>> What is the right way to deal with dates in the following context.
>>
>> I'm using the updaterecord processor to add a datestamp field to a record
>> (derived from a filename attribute inserted by the getfile processor).
>>
>> /Visit_DateTime.
>> ${filename:substringBeforeLast('.'):substringAfterLast('_'):toDate('yyyyMMdd'):format('yyyy-MM-dd'T'HH:mm:ss'Z'")
>>
>> Inside the groovy script I'm attempting to convert to date as follows:
>>
>> VisitTimeValue = new Date(currRecord.get(Visit_DateTime as String))
>>
>> However I always get messages about "could not find matching constructor
>> for java.util.Date(org.apackge.avro.util.Utf8)"
>>
>> I have a previously working version, from a slightly different context
>> which did a cast to long: Date((long)currRecord.get....). In that case the
>> record was created by a database query.
>>
>> The eventual use of VisitTimeValue is to dump it into a flowfile
>> attribute.
>>
>> It seems to me that the type of the date field is not being correctly
>> inferred by the avro reader/writers after I create it with the expression
>> language. Alternatively, perhaps I should be using different date handling
>> tools inside groovy.
>>
>> All advice welcome.
>> Thanks
>>
>>
>

Reply via email to