On May 12, 2006, at 1:02 PM, Michael Levy wrote:
One nice feature of INQUERY is that you can create one large SGML file, containing lots of records, each bracketed with <DOC> and </ DOC> tags. Submitting that big SGML document for indexing goes very fast. I believe that Solr indexes one document at a time; each document requires a separate HTTP POST.

Actually adding multiple documents per POST is possible

How efficient is making a separate HTTP request per-document, when there are millions of documents? Do people ever use Solr's or Lucene's API directly for indexing large numbers of documents, and if so, what are the considerations pro and con?

Maybe Solr could evolve a facility for doing these types of bulk operations without HTTP, but still using Solr's engine somehow via API directly. I guess this gets tricky when you have a live Solr system up and juggling write locks though.

But currently going through HTTP is the only way, and likely to not be that much of a bottleneck especially given you can post multiple documents at a time (the wiki has an example, but I can't get to the web at the moment to post the link).

        Erik

Reply via email to