QueryRecord: FORMAT_DATETIME or CAST or ??

Michael via users Fri, 01 Nov 2024 02:35:27 -0700

Hi all,

I'm using a QueryRecord Processor to e.g.


 SELECT RPATH(data, '/attr') AS attr
 FROM FLOWFILE

from a Json Structure.

 "attr" is a String, quoted. In fact it's a formatted DateTime, "yyyy-MM-dd 
HH:mm:ss". When selecting the records with RPATH_INT (JsonTreeReader, infer 
schema) and writing them back (JsonRecordSetWriter, infer schema), all of a 
sudden I now have the "date" columns formatted with a trailing ".0", e.g. 
"2018-01-02 13:53:00.0"

Why? And how to avoid? Seems that a FORMAT_DATETIME is avail. for Apache 
Calcite, but not supported by QueryRecord? Tried also unsuccessful with 
CAST()...

Hints would be great, running out of ideas :)

FYI reason/background is that I have a pretty large XML, containing structured 
master/detail data. I've used a TransformXml to get a json like


[
  {
    "type": "master",
    "data": {
        "attr1": "value",
        ...
     },
    "type": "detail",
    "data": {
        "master-key": "...",
        ...
     },
    ...
]

I need to split master and details now, to process them separately via a REST 
API.

And as I have no clue how to define a generic Avro or whatever schema telling 
any other record oriented processor that these XMLs could contain any attrs for 
master and completely different attrs for detail, I finally tried with 
QueryRecord.

Works not that fast, but like a charm, everything ready. Except the DateTime 
"conversion" :) Surely I could append a past-processing ReplaceText or whatever 
to workaround, but exactly this path of my DataFlow is the most expensive,  the 
TransformXML and QueryRecord, XML is 750MB of size, 1,5 mio master/detail 
records... so would be cool if no workaround necc.

Regards,
Michael
-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

QueryRecord: FORMAT_DATETIME or CAST or ??

Reply via email to