Hi. When Tika run text extraction, a excel files protected reading password throws exception like attachment text bellow. (not writing password but reading password) Is this known ploblem?
Regards, Shinichiro Abe. Reading password:'2'
2_yomitori.xls
Description: MS-Excel spreadsheet
2_yomitori.xlsx
Description: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
abe:target abe$ java -jar tika-app-0.9.jar 2_yomitori.xls
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@50fa70a4
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91)
Caused by: org.apache.poi.EncryptedDocumentException: Default password is
invalid for docId/saltData/saltHash
at
org.apache.poi.hssf.record.RecordFactoryInputStream$StreamEncryptionInfo.createDecryptingStream(RecordFactoryInputStream.java:101)
at
org.apache.poi.hssf.record.RecordFactoryInputStream.<init>(RecordFactoryInputStream.java:169)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:139)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:106)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:276)
at
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:136)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:189)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 5 more
abe:target abe$ java -jar tika-app-0.9.jar 2_yomitori.xlsx
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@4d480ea
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91)
Caused by: java.lang.RuntimeException: Buffer underrun - requested 2 bytes but
0 was available
at
org.apache.poi.poifs.filesystem.DocumentInputStream.checkAvaliable(DocumentInputStream.java:202)
at
org.apache.poi.poifs.filesystem.DocumentInputStream.readUShort(DocumentInputStream.java:300)
at
org.apache.poi.poifs.filesystem.DocumentInputStream.readShort(DocumentInputStream.java:220)
at
org.apache.poi.poifs.crypt.EncryptionHeader.<init>(EncryptionHeader.java:58)
at
org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:44)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:209)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 5 more
abe:target abe$
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
