Hi, 1. Does the use of Tika replace the configuration of SearchIndex.textFilterClasses? If I read the code correctly, all formats in tika-config.xml are configured by default. So SearchIndex.textFilterClasses is only used for types not declared in tika-config.xml, right?
2. According to [a] (and my own repository.xml ;) , the extractor is called org.apache.jackrabbit.extractor.MsPowerPointTextExtractor , not org.apache.jackrabbit.extractor.MsPowerPointExtractor. Marc [a] http://jackrabbit.apache.org/jackrabbit-text-extractors.html Index: src/main/java/org/apache/jackrabbit/core/query/lucene/JackrabbitParser.java =================================================================== --- src/main/java/org/apache/jackrabbit/core/query/lucene/JackrabbitParser.java (revision 915798) +++ src/main/java/org/apache/jackrabbit/core/query/lucene/JackrabbitParser.java (working copy) @@ -114,7 +114,7 @@ "org.apache.jackrabbit.extractor.MsOutlookTextExtractor")) { parsers.put("application/vnd.ms-outlook", new OfficeParser()); } else if (name.equals( - "org.apache.jackrabbit.extractor.MsPowerPointExtractor")) { + "org.apache.jackrabbit.extractor.MsPowerPointTextExtractor")) { Parser parser = new OfficeParser(); parsers.put("application/vnd.ms-powerpoint", parser); parsers.put("application/mspowerpoint", parser);
