[ 
https://issues.apache.org/jira/browse/TIKA-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3788:
------------------------------
    Description: 
As part of work on TIKA-3787, I'll add a ParseRecord to the ParseContext.  This 
can be used by parsers that parse embedded files to record caught exceptions 
and warning messages.  The CompositeParser keeps track of depth of its parse 
and when the depth returns to 0, it will write these exceptions and warnings to 
the Metadata object.

I would still highly recommend /rmeta, -J, the RecursiveParserWrapper, but this 
new capability adds some functionality to the standard /tika (with json 
output), and programmatically to the AutoDetectParser.

Because this information is added to the metadata object _after_ the parse, it 
will not come through in streaming contexts where the metadata object is 
written to the xhtml before the content of the file is parsed.  So, this will 
not add any benefit to /tika (text/html).

  was:
As part of work on TIKA-3787, I'll add a ParseRecord to the ParseContext.  This 
can be used by parsers that parse embedded files to record caught exceptions 
and warning messages.  The CompositeParser keeps track of depth of its parse 
and when the depth returns to 0, it will write these exceptions and warnings to 
the Metadata object.

I would still highly recommend /rmeta, -J, the RecursiveParserWrapper, but this 
new capability adds some functionality to the standard /tika (with json 
output), and programmatically to the AutoDetectParser.

Because this information is added to the metadata object _after_ the parse, it 
will not come through in streaming contexts where the metadata object has is 
written to the xhtml before the content of the file is parsed.  So, this will 
not add any benefit to /tika (text/html).


> Allow embedded exceptions and warnings to percolate to the parent's metadata
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-3788
>                 URL: https://issues.apache.org/jira/browse/TIKA-3788
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>             Fix For: 2.4.1
>
>
> As part of work on TIKA-3787, I'll add a ParseRecord to the ParseContext.  
> This can be used by parsers that parse embedded files to record caught 
> exceptions and warning messages.  The CompositeParser keeps track of depth of 
> its parse and when the depth returns to 0, it will write these exceptions and 
> warnings to the Metadata object.
> I would still highly recommend /rmeta, -J, the RecursiveParserWrapper, but 
> this new capability adds some functionality to the standard /tika (with json 
> output), and programmatically to the AutoDetectParser.
> Because this information is added to the metadata object _after_ the parse, 
> it will not come through in streaming contexts where the metadata object is 
> written to the xhtml before the content of the file is parsed.  So, this will 
> not add any benefit to /tika (text/html).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to