Hi, On Tue, Sep 29, 2009 at 12:08 AM, Ken Krugler <[email protected]> wrote: > Just for grins, I set up for types with names ending in +xml to > automatically get application/xml as the parent mimetype. > > But when I used TikaCLI to process a test.xspf file, no content was > generated. > > The issue is that CompositeParser.getParser() doesn't use supertypes when > falling back - if it can't get a parser for the exact mimetype, then it goes > straight to the fallback parser. > > It seems like it should try to use the mimetype hierarchy. If so, I can file > an issue and a patch.
Correct, that would be great. Note that both the MimeType.getSuperType() method already does some of this and we have related supertype settings stored in the tika-mimetypes.xml configuration. The type registry could also be told about the +xml convention and related implicit supertype settings like the ones encoded in the MediaType.isSpecializationOf() method. (Note that we currently have both MimeType and MediaType classes for similar purposes. This is due to an ongoing redesign of the mime type registry. For now it's probably best to work on the MimeType class until the redesign is more complete.) BR, Jukka Zitting
