Eamonn Saunders created TIKA-2519:
-------------------------------------

             Summary: Issue parsing multiple CHM files concurrently
                 Key: TIKA-2519
                 URL: https://issues.apache.org/jira/browse/TIKA-2519
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.16
            Reporter: Eamonn Saunders
            Priority: Minor


Should I expect to be able to parse multiple CHM files concurrently in multiple 
threads?
What I'm noticing when attempting to parse 2 different CHM files in different 
threads is that:

- ChmExtractor.extractChmEntry() gets a ChmBlockInfo as follows:
{code}
                ChmBlockInfo bb = ChmBlockInfo.getChmBlockInfoInstance(
                        directoryListingEntry, (int) getChmLzxcResetTable()
                                .getBlockLen(), getChmLzxcControlData());
{code}
- ChmBlockInfo.getChmBlockInfoInstance() is a static method that appears to 
limit the number of ChmBlockInfo instances to 1.
{code}
    public static ChmBlockInfo getChmBlockInfoInstance(
            DirectoryListingEntry dle, int bytesPerBlock,
            ChmLzxcControlData clcd) {
        setChmBlockInfo(new ChmBlockInfo());
        getChmBlockInfo().setStartBlock(dle.getOffset() / bytesPerBlock);
        getChmBlockInfo().setEndBlock(
                (dle.getOffset() + dle.getLength()) / bytesPerBlock);
        getChmBlockInfo().setStartOffset(dle.getOffset() % bytesPerBlock);
        getChmBlockInfo().setEndOffset(
                (dle.getOffset() + dle.getLength()) % bytesPerBlock);
        // potential problem with casting long to int
        getChmBlockInfo().setIniBlock(
                getChmBlockInfo().startBlock - getChmBlockInfo().startBlock
                        % (int) clcd.getResetInterval());
//                (getChmBlockInfo().startBlock - getChmBlockInfo().startBlock)
//                        % (int) clcd.getResetInterval());
        return getChmBlockInfo();
    }
{code}

Is there a good reason why there should only ever be one instance of 
ChmBlockInfo?

Should we forget about attempting to process CHM files in parallel and instead 
queue them up to be processed sequentially?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to