Thanks so much for your suggestions. I am attempting to index 550K
docs at once, but have found I've had to break them up into smaller
batches. Indexing seems to stop at around 47K docs (the index reaches
264M in size at this point). The index eventually itself grows to
about 2Gb. I am using embedded solr and adding a document with code
very similar to this:
private void addModel(Model model) throws IOException {
UpdateHandler updateHandler = solrCore.getUpdateHandler();
AddUpdateCommand addcmd = new AddUpdateCommand();
DocumentBuilder builder = new DocumentBuilder
(solrCore.getSchema());
builder.startDoc();
builder.addField("id", "Model:" + model.getUuid());
builder.addField("class", "Model");
builder.addField("uuid", model.getUuid());
builder.addField("one_facet", model.getOneFacet());
builder.addField("another_facet", model.getAnotherFacet());
.. other fields
addcmd.doc = builder.getDoc();
addcmd.allowDups = false;
addcmd.overwritePending = true;
addcmd.overwriteCommitted = true;
updateHandler.addDoc(addcmd);
}
I have other 'Model' objects I'm adding also.
Thanks
On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
: I would think you would see better performance by allowing auto
commit
: to handle the commit size instead of reopening the connection all
the
: time.
if your goal is "fast" indexing, don't use autoCommit at all ... just
index everything, and don't commit until you are completely done.
autoCommitting will slow your indexing down (the benefit being that
more
results will be visible to searchers as you proceed)
-Hoss