XPathEntityProcessor timeout when stream=true
---------------------------------------------

                 Key: SOLR-1539
                 URL: https://issues.apache.org/jira/browse/SOLR-1539
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
    Affects Versions: 1.4
            Reporter: Chris Eldredge
         Attachments: SOLR-1539.patch

When setting stream=true on XPathEntityProcessor a separate thread is created 
to read whatever Reader is being used for rows while the original thread pumps 
a BlockingQueue.  This design allows the Reader to be read even when DIH cannot 
process documents as quickly as they become available in the Reader.

This design has questionable value.  It adds complexity to the code with 
unclear benefits to the user.

At any rate, the code incorrectly uses the BlockingQueue API:

1.  Arbitrarily sets a 10 second timeout and fails when this timeout elapses 
before a row becomes available.
2.  Fails to check the return code when calling offer() to see if the item was 
successfully added or if the queue is full.
3.  Fails to stop consuming the Reader even after an import has failed or been 
aborted.

The effect is that if a URL being processed pauses more than 10 seconds to 
think in between streaming rows, the XPathEntityProcessor fails.  Setting the 
readTimeout and connectionTimeout attributes on the dataSource does not address 
this bug because XPathEntityProcessor imposes its own timeout, hard-coded to 10 
seconds.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to