[ 
https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660572#action_12660572
 ] 

Noble Paul commented on SOLR-906:
---------------------------------

Please ignore the number 40K docs. I just took it from your perf test numbers. 
I thought you were writing docs as a list

I am referring to the client code .The method in UpdateRequest
{code}
public Collection<ContentStream> getContentStreams() throws IOException {
    return ClientUtils.toContentStreams( getXML(), ClientUtils.TEXT_XML );
}
{code}

This means that the getXML() method actually constructs a huge String which is 
the entire xml. It is not very good if we are writing out very large no:of docs

I am suggesting that ComonsHttpSolrServer has scope for improvement. Instead of 
building that String in memory  we can just start streaming it to the server. 
So the OutputStream can be passed on to UpdateRequest so that it can write the 
xml right into the stream. So there is streaming effectively on both ends

This is valid where users do bulk updates. Not when they write one doc at a 
time. 

The new method SolrServer#add(Iterator<SolrInputDocs> docs) can start writing 
the docs immedietly and the docs can be uploaded as and when they are being 
produced. It is not related to these issue exactly, But the intend of this 
issue is to make upload faster.


SOLR-865 is not very related to this issue. StreamingHttpSolrServer can use 
javabin format as well.

bq.with the StreamingHttpSolrServer, you can send documents one at a time and 
each documents starts sending as soon as it can
One drawback of a StreamingHttpSolrServer is that it ends up opening  multiple 
connections for uploading the documents

Another enhancement . We can add one (or more ) extra thread in the server to 
do the call updaterequestprocessor.processAdd() . 

> Buffered / Streaming SolrServer implementaion
> ---------------------------------------------
>
>                 Key: SOLR-906
>                 URL: https://issues.apache.org/jira/browse/SOLR-906
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>            Reporter: Ryan McKinley
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: SOLR-906-StreamingHttpSolrServer.patch, 
> SOLR-906-StreamingHttpSolrServer.patch, 
> SOLR-906-StreamingHttpSolrServer.patch, 
> SOLR-906-StreamingHttpSolrServer.patch, StreamingHttpSolrServer.java
>
>
> While indexing lots of documents, the CommonsHttpSolrServer add( 
> SolrInputDocument ) is less then optimal.  This makes a new request for each 
> document.
> With a "StreamingHttpSolrServer", documents are buffered and then written to 
> a single open Http connection.
> For related discussion see:
> http://www.nabble.com/solr-performance-tt9055437.html#a20833680

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to