[ 
https://issues.apache.org/jira/browse/TIKA-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458721#comment-16458721
 ] 

ASF GitHub Bot commented on TIKA-2636:
--------------------------------------

lewismc commented on issue #234: TIKA-2636 ENVI Header metadata fields can span 
more than one line
URL: https://github.com/apache/tika/pull/234#issuecomment-385451046
 
 
   Will commit by EoD today unless further comments. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ENVI Header metadata fields can span more than one line
> -------------------------------------------------------
>
>                 Key: TIKA-2636
>                 URL: https://issues.apache.org/jira/browse/TIKA-2636
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: ang20150420t182050_corr_v1e_img.hdr
>
>
> [~tpalsulich] was correct when [he 
> stated|https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046140#comment-14046140]
>  "...See below for how to read and output line by line (copy & paste between 
> the xml start/end in EnviHeaderParser). I have a hunch this isn't really what 
> we want -- what if a metadata field has a newline in it? What if the line is 
> too long to fit into a string? On the other hand, with nice input, it's much 
> nicer output."
> As it turns out ENVI header metadata fields can span more than one line. An 
> example is as follows
> {code}
> 1.    ENVI
> 2.    description = {
> 3.      Georeferenced Image built from input GLT. [Wed Jun 10 04:37:54 2015] 
> [Wed
> 4.      Jun 10 04:48:52 2015]}
> 5.    samples = 739
> 6.    lines = 14674
> 7.    bands = 432
> 8.    header offset = 0
> 9.    file type = ENVI Standard
> 10.    data type = 4
> 11.    interleave = bil
> 12.    sensor type = Unknown
> 13.    byte order = 0
> 14.    map info = { UTM , 1.000 , 1.000 , 724522.127 , 4074620.759 , 
> 1.1000000000e+00 , 1.1000000000e+00 , 12 , North , WGS-84 , units=Meters , 
> rotation=75.00000000 }
> 15.    wavelength units = Nanometers
> ...
> {code}
> The case here is when a metadata field value is contained within curly 
> brackets. The examples above are clearly L2-L4 where the value is spread over 
> three lines and L14 where the value is contained within the one line.
> This requires a patch to fix the 
> [EnviHeaderParser|https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to