Thanks so much for your suggestions. I am attempting to index 550K docs at once, but have found I've had to break them up into smaller batches. Indexing seems to stop at around 47K docs (the index reaches 264M in size at this point). The index eventually itself grows to about 2Gb. I am using embedded solr and adding a document with code very similar to this:


    private void addModel(Model model) throws IOException {
        UpdateHandler updateHandler = solrCore.getUpdateHandler();
        AddUpdateCommand addcmd = new AddUpdateCommand();

DocumentBuilder builder = new DocumentBuilder (solrCore.getSchema());
        builder.startDoc();
        builder.addField("id", "Model:" + model.getUuid());
        builder.addField("class", "Model");
        builder.addField("uuid", model.getUuid());
        builder.addField("one_facet", model.getOneFacet());
        builder.addField("another_facet", model.getAnotherFacet());
                
                  .. other fields
                
        addcmd.doc = builder.getDoc();
        addcmd.allowDups = false;
        addcmd.overwritePending = true;
        addcmd.overwriteCommitted = true;
        updateHandler.addDoc(addcmd);
    }

I have other 'Model' objects I'm adding also.

Thanks

On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:


: I would think you would see better performance by allowing auto commit : to handle the commit size instead of reopening the connection all the
: time.

if your goal is "fast" indexing, don't use autoCommit at all ... just
index everything, and don't commit until you are completely done.

autoCommitting will slow your indexing down (the benefit being that more
results will be visible to searchers as you proceed)




-Hoss


Reply via email to