On 7/10/2013 9:59 AM, Tom Burton-West wrote:
The Javadoc for NRTCachingDirectoy (
http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/store/NRTCachingDirectory.html?is-external=true)
  says:

  "This class is likely only useful in a near real-time context, where
indexing rate is lowish but reopen rate is highish, resulting in many tiny
files being written..."

It seems like we have exactly the opposite use case, so we would like
advice on what directory implementation to use instead.

We are doing offline batch indexing, so no searches are being done.  So we
don't need NRT.  We also have a high indexing rate as we are trying to
index 3 billion pages as quickly as possible.

I am not clear what determines the reopen rate.   Is it only related to
searching or is it involved in indexing as well?

  Does the NRTCachingDirectory have any benefit for indexing under the use
case noted above?

I'm guessing we should just use the solrStandardDirectoryFactory instead.
  Is this correct?

The NRT directory object in Solr uses the MMap implementation as its default delegate. I would use MMapDirectoryFactory (the default for most of the 3.x releases) for testing whether you can get any improvement from moving away from the default. The advantages of memory mapping are not something you'd want to give up.

Thanks,
Shawn

Reply via email to