Adam Taft created NIFI-8702:
-------------------------------

             Summary: ListFile produces invalid datetime attributes
                 Key: NIFI-8702
                 URL: https://issues.apache.org/jira/browse/NIFI-8702
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
    Affects Versions: 1.13.2
            Reporter: Adam Taft


The ListFile processor produces date-time attribute strings that are 
non-conforming to the ISO-8601 standard. This affects the flowfile attributes 
created by the processor: 'file.creationTime', 'file.lastAccessTime' and 
'file.lastModifiedTime'.

An example output attribute value looks like:  2021-06-14T18:19:20+0000

This is non-conforming to the ISO-8601 format, because it mixes "basic" syntax 
with "extended" syntax. The 'basic' ISO-8601 format does not include separators 
designed for human readability (like dashes, colons, etc.). The 'extended' 
syntax is designed for human consumption and includes the separators.

The problem is that ListFile produces attributes that _mix_ basic and extended 
format, as shown above, in the offset component. e.g. the offset is missing the 
required colon separator.

The above attribute example should instead be formatted as such: 
2021-06-14T18:19:20+00:00

Note the colon in the offset component "00:00". The colon is required because 
the other segments are using extended format (with separators), and a 
consistent representation must be kept.

The test for the output from ListFile should simply be whether the attributes 
can be parsed by DateTimeFormatter.ISO_OFFSET_DATE_TIME without throwing an 
exception.

Some references:

[https://stackoverflow.com/questions/38252867/what-is-the-right-iso8601-format]

[https://en.wikipedia.org/wiki/ISO_8601#General_principles]
{quote}Representations can be done in one of two formats – a basic format with 
a minimal number of separators or an extended format with separators added to 
enhance human readability.
{quote}
[https://bugs.openjdk.java.net/browse/JDK-8176547]

The goal of a fix would be to allow downstream processors to properly parse and 
interpret fully conformant ISO-8601 attributes using standard 
java.time.DateTimeFormatter parsers. The current situation requires a custom 
parsing operation (or string hacking) to properly parse with java.time, which 
is not ideal.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to