Tom:

ConcurrentUpdateSolrServer isn't magic or anything. You could pretty trivially write something that takes batches of your XML documents and combines them into a single document (multiple <doc> tags in the <add> section) and sends them up to Solr and achieve some of the same speed benefits.

If you use it, the JavaBin-based serialization in CUSS is lighter as a wire format, though: http://lucene.apache.org/solr/4_10_2/solr-solrj/org/apache/solr/client/solrj/impl/BinaryRequestWriter.html

Only thing you have to worry about (in both the CUSS and the home grown case) is a single bad document in a batch fails the whole batch. It's up to you to fall back to writing them individually so the rest of the batch makes it in.

Michael

On 12/11/14 11:04, Erick Erickson wrote:
I don't think so, it uses SolrInputDocuments and
lists thereof. So if you parse the xml and then
put things in SolrInputDocuments......

Or something like that.

Erick

On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West <tburt...@umich.edu> wrote:
Thanks Eric,

That is helpful.  We already have a process that works similarly.  Each
thread/process that sends a document to Solr waits until it gets a response
in order to make sure that the document was indexed successfully (we log
errors and retry docs that don't get indexed successfully), however we run
20-100 of these processes,depending on  throughput (i.e. we send documents
to Solr for indexing as fast as we can until they start queuing up on the
Solr end.)

Is there a way to use CUSS with XML documents?

ie my second question:
A related question, is how to use ConcurrentUpdateSolrServer with XML
documents

I have very large XML documents, and the examples I see all build
documents
by adding fields in Java code.  Is there an example that actually reads
XML
files from the file system?
Tom

Reply via email to