Hi all,
I'm using a QueryRecord Processor to e.g.
SELECT RPATH(data, '/attr') AS attr
FROM FLOWFILE
from a Json Structure.
"attr" is a String, quoted. In fact it's a formatted DateTime, "yyyy-MM-dd
HH:mm:ss". When selecting the records with RPATH_INT (JsonTreeReader, infer
schema) and writing them back (JsonRecordSetWriter, infer schema), all of a
sudden I now have the "date" columns formatted with a trailing ".0", e.g.
"2018-01-02 13:53:00.0"
Why? And how to avoid? Seems that a FORMAT_DATETIME is avail. for Apache
Calcite, but not supported by QueryRecord? Tried also unsuccessful with
CAST()...
Hints would be great, running out of ideas :)
FYI reason/background is that I have a pretty large XML, containing structured
master/detail data. I've used a TransformXml to get a json like
[
{
"type": "master",
"data": {
"attr1": "value",
...
},
"type": "detail",
"data": {
"master-key": "...",
...
},
...
]
I need to split master and details now, to process them separately via a REST
API.
And as I have no clue how to define a generic Avro or whatever schema telling
any other record oriented processor that these XMLs could contain any attrs for
master and completely different attrs for detail, I finally tried with
QueryRecord.
Works not that fast, but like a charm, everything ready. Except the DateTime
"conversion" :) Surely I could append a past-processing ReplaceText or whatever
to workaround, but exactly this path of my DataFlow is the most expensive, the
TransformXML and QueryRecord, XML is 750MB of size, 1,5 mio master/detail
records... so would be cool if no workaround necc.
Regards,
Michael
--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.