Tom:
ConcurrentUpdateSolrServer isn't magic or anything. You could pretty
trivially write something that takes batches of your XML documents and
combines them into a single document (multiple <doc> tags in the <add>
section) and sends them up to Solr and achieve some of the same speed
benefits.
If you use it, the JavaBin-based serialization in CUSS is lighter as a
wire format, though:
http://lucene.apache.org/solr/4_10_2/solr-solrj/org/apache/solr/client/solrj/impl/BinaryRequestWriter.html
Only thing you have to worry about (in both the CUSS and the home grown
case) is a single bad document in a batch fails the whole batch. It's up
to you to fall back to writing them individually so the rest of the
batch makes it in.
Michael
On 12/11/14 11:04, Erick Erickson wrote:
I don't think so, it uses SolrInputDocuments and
lists thereof. So if you parse the xml and then
put things in SolrInputDocuments......
Or something like that.
Erick
On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West <tburt...@umich.edu> wrote:
Thanks Eric,
That is helpful. We already have a process that works similarly. Each
thread/process that sends a document to Solr waits until it gets a response
in order to make sure that the document was indexed successfully (we log
errors and retry docs that don't get indexed successfully), however we run
20-100 of these processes,depending on throughput (i.e. we send documents
to Solr for indexing as fast as we can until they start queuing up on the
Solr end.)
Is there a way to use CUSS with XML documents?
ie my second question:
A related question, is how to use ConcurrentUpdateSolrServer with XML
documents
I have very large XML documents, and the examples I see all build
documents
by adding fields in Java code. Is there an example that actually reads
XML
files from the file system?
Tom