[ https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660607#action_12660607 ]
Ryan McKinley commented on SOLR-906: ------------------------------------ Are you looking at the patch or just brainstorming how this could be implemented? {panel} I am referring to the client code .The method in UpdateRequest public Collection<ContentStream> getContentStreams() throws IOException { return ClientUtils.toContentStreams( getXML(), ClientUtils.TEXT_XML ); } This means that the getXML() method actually constructs a huge String which is the entire xml. It is not very good if we are writing out very large no:of docs {panel} This is not how the patch works... for starters, it never calls getContentStreams() for UpdateRequest. It opens a single connection and continually dumps the xml for each request. Rather then call getXML() the patch adds a function writeXml( Writer ) that writes directly to the open buffer. {panel} I am suggesting that ComonsHttpSolrServer has scope for improvement. Instead of building that String in memory we can just start streaming it to the server. So the OutputStream can be passed on to UpdateRequest so that it can write the xml right into the stream. So there is streaming effectively on both ends {panel} The ComonsHttpSolrServer is fine, but you are right that each UpdateRequest *may* want to write the content directly to the open stream. The ContentStream interface gives us all that control. One thing to note is that if you do not specify the length, the HttpCommons server will use chunked encoding. But I think adding the StreammingUpdateSolrServer resolves that for everyone. Uses have either option. {panel} One drawback of a StreamingHttpSolrServer is that it ends up opening multiple connections for uploading the documents {panel} Nonsense -- that is exactly what this avoids. It opens a single connection and writes everything to it. You can configure how many threads you want emptying the queue; each one will open a connection. {panel} Another enhancement . We can add one (or more ) extra thread in the server to do the call updaterequestprocessor.processAdd() . {panel} That opens a whole can of worms... perhaps better discussed on java-dev. For now I think sticking to the 1 thread/prequest is a good model. If you want multiple threads running on the server use multiple connections (it is even an argument in the StreammingHttpSolrServer) > Buffered / Streaming SolrServer implementaion > --------------------------------------------- > > Key: SOLR-906 > URL: https://issues.apache.org/jira/browse/SOLR-906 > Project: Solr > Issue Type: New Feature > Components: clients - java > Reporter: Ryan McKinley > Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: SOLR-906-StreamingHttpSolrServer.patch, > SOLR-906-StreamingHttpSolrServer.patch, > SOLR-906-StreamingHttpSolrServer.patch, > SOLR-906-StreamingHttpSolrServer.patch, StreamingHttpSolrServer.java > > > While indexing lots of documents, the CommonsHttpSolrServer add( > SolrInputDocument ) is less then optimal. This makes a new request for each > document. > With a "StreamingHttpSolrServer", documents are buffered and then written to > a single open Http connection. > For related discussion see: > http://www.nabble.com/solr-performance-tt9055437.html#a20833680 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.