Eamonn Saunders created TIKA-2519: ------------------------------------- Summary: Issue parsing multiple CHM files concurrently Key: TIKA-2519 URL: https://issues.apache.org/jira/browse/TIKA-2519 Project: Tika Issue Type: Bug Affects Versions: 1.16 Reporter: Eamonn Saunders Priority: Minor
Should I expect to be able to parse multiple CHM files concurrently in multiple threads? What I'm noticing when attempting to parse 2 different CHM files in different threads is that: - ChmExtractor.extractChmEntry() gets a ChmBlockInfo as follows: {code} ChmBlockInfo bb = ChmBlockInfo.getChmBlockInfoInstance( directoryListingEntry, (int) getChmLzxcResetTable() .getBlockLen(), getChmLzxcControlData()); {code} - ChmBlockInfo.getChmBlockInfoInstance() is a static method that appears to limit the number of ChmBlockInfo instances to 1. {code} public static ChmBlockInfo getChmBlockInfoInstance( DirectoryListingEntry dle, int bytesPerBlock, ChmLzxcControlData clcd) { setChmBlockInfo(new ChmBlockInfo()); getChmBlockInfo().setStartBlock(dle.getOffset() / bytesPerBlock); getChmBlockInfo().setEndBlock( (dle.getOffset() + dle.getLength()) / bytesPerBlock); getChmBlockInfo().setStartOffset(dle.getOffset() % bytesPerBlock); getChmBlockInfo().setEndOffset( (dle.getOffset() + dle.getLength()) % bytesPerBlock); // potential problem with casting long to int getChmBlockInfo().setIniBlock( getChmBlockInfo().startBlock - getChmBlockInfo().startBlock % (int) clcd.getResetInterval()); // (getChmBlockInfo().startBlock - getChmBlockInfo().startBlock) // % (int) clcd.getResetInterval()); return getChmBlockInfo(); } {code} Is there a good reason why there should only ever be one instance of ChmBlockInfo? Should we forget about attempting to process CHM files in parallel and instead queue them up to be processed sequentially? -- This message was sent by Atlassian JIRA (v6.4.14#64029)