Tim Allison created TIKA-1814: --------------------------------- Summary: Add a standalone XMPScannerParser Key: TIKA-1814 URL: https://issues.apache.org/jira/browse/TIKA-1814 Project: Tika Issue Type: Improvement Reporter: Tim Allison Priority: Minor
Several parsers make use of XMP data and normalize it via dc or other standards into our metadata object. We're currently either relying on dependencies to make sense of multiple XMP packets within a file (PDFBox for PDFParser) or we're just grabbing the first (TiffParser via JempboxExtractor and XMPPacketScanner) or...which other parsers are processing XMP? It might be useful to extract all XMPPackets from a file and store those raw bytes as Base64 encoded Strings in the Metadata object. Advanced users could then have access to the raw XMP streams. For Tika 1.x, unless users configured it, nothing would call it. For Tika 2.x, once we get the combo configurable parsers set up, a user could configure a combo/additive parser, e.g., a PDFParser that is a combination of our current PDFParser and then this new XMPScannerParser. -- This message was sent by Atlassian JIRA (v6.3.4#6332)