[
https://issues.apache.org/jira/browse/TIKA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712283#action_12712283
]
Jukka Zitting commented on TIKA-232:
------------------------------------
If you're instantiating the package parsers directly, then you can achieve this
simply by overriding the parser that is used for the files inside a package:
PackageParser parser = ...;
parser.setParser(new EmptyParser());
You could also use the following hack to do this for a pre-configured composite
parser like the AutoDetectParser:
CompositeParser composite = new AutoDetectParser();
for (Parser parser : composite.getParsers().values()) {
if (Parser instanceof PackageParser) {
((PackageParser) parser).setParser(new EmptyParser());
}
}
Perhaps someone has a good idea how to make this easier?
> Scanning of archive files
> -------------------------
>
> Key: TIKA-232
> URL: https://issues.apache.org/jira/browse/TIKA-232
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Priority: Minor
>
> If i parse an archive all the files inside the archive will be extracted with
> their text as well. It would be nice to have the choice to extract only the
> list of files (directory) of an archive instead of extracting the whole
> contents. This seemed to be usable only for zip, tar, tar.gz, tar.bz2, .jar.
> May be this could be realized by using a different calling or by a run time
> configuration.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.