[ https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660572#action_12660572 ]
Noble Paul commented on SOLR-906: --------------------------------- Please ignore the number 40K docs. I just took it from your perf test numbers. I thought you were writing docs as a list I am referring to the client code .The method in UpdateRequest {code} public Collection<ContentStream> getContentStreams() throws IOException { return ClientUtils.toContentStreams( getXML(), ClientUtils.TEXT_XML ); } {code} This means that the getXML() method actually constructs a huge String which is the entire xml. It is not very good if we are writing out very large no:of docs I am suggesting that ComonsHttpSolrServer has scope for improvement. Instead of building that String in memory we can just start streaming it to the server. So the OutputStream can be passed on to UpdateRequest so that it can write the xml right into the stream. So there is streaming effectively on both ends This is valid where users do bulk updates. Not when they write one doc at a time. The new method SolrServer#add(Iterator<SolrInputDocs> docs) can start writing the docs immedietly and the docs can be uploaded as and when they are being produced. It is not related to these issue exactly, But the intend of this issue is to make upload faster. SOLR-865 is not very related to this issue. StreamingHttpSolrServer can use javabin format as well. bq.with the StreamingHttpSolrServer, you can send documents one at a time and each documents starts sending as soon as it can One drawback of a StreamingHttpSolrServer is that it ends up opening multiple connections for uploading the documents Another enhancement . We can add one (or more ) extra thread in the server to do the call updaterequestprocessor.processAdd() . > Buffered / Streaming SolrServer implementaion > --------------------------------------------- > > Key: SOLR-906 > URL: https://issues.apache.org/jira/browse/SOLR-906 > Project: Solr > Issue Type: New Feature > Components: clients - java > Reporter: Ryan McKinley > Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: SOLR-906-StreamingHttpSolrServer.patch, > SOLR-906-StreamingHttpSolrServer.patch, > SOLR-906-StreamingHttpSolrServer.patch, > SOLR-906-StreamingHttpSolrServer.patch, StreamingHttpSolrServer.java > > > While indexing lots of documents, the CommonsHttpSolrServer add( > SolrInputDocument ) is less then optimal. This makes a new request for each > document. > With a "StreamingHttpSolrServer", documents are buffered and then written to > a single open Http connection. > For related discussion see: > http://www.nabble.com/solr-performance-tt9055437.html#a20833680 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.