FileListEntityProcessor can't handle directories containing lots of files
-------------------------------------------------------------------------

                 Key: SOLR-798
                 URL: https://issues.apache.org/jira/browse/SOLR-798
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
            Reporter: Grant Ingersoll
            Priority: Minor


The FileListEntityProcessor currently tries to process all documents in a 
single directory at once, and stores the results into a hashmap.  On 
directories containing a large number of documents, this quickly causes 
OutOfMemory errors.

Unfortunately, the typical fix to this is to hack FileFilter to do the work for 
you and always return false from the accept method.  It may be possible to hook 
up some type of Producer/Consumer multithreaded FileFilter approach whereby the 
FileFilter blocks until the nextRow() mechanism requests another row, thereby 
avoiding the need to cache everything in the map.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to