On 8/4/2010 11:11 PM, jayendra patil wrote:
ContentStreamUpdateRequest seems to read the file contents and transfer it
over http, which slows down the indexing.

Try Using StreamingUpdateSolrServer with stream.file param @
http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post

e.g.

SolrServer server = new StreamingUpdateSolrServer("Solr Server URL",20,8);
UpdateRequest req = new UpdateRequest("/update/extract");
ModifiableSolrParams params = null ;
params = new ModifiableSolrParams();
params.add("stream.file", new String[]{"local file path"});
params.set("literal.id", value);
req.setParams(params);
server.request(req);
server.commit();

Thanks for your suggestions. Unfortunately, I'm still seeing poor performance.

To be clear, I am trying to have SOLR index multiple documents that exist on a remote server. I'd prefer that SOLR stream the documents after I pass a pointer to them rather than me retrieving and pushing them so I can avoid network overhead.

When I do this:

curl 'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdf&stream.contentType=application/pdf&literal.content_id=12342&commit=true'

It returns in around a second. When I execute the attached code it takes just over three minutes. The optimal for me would be able get closer to the performance I'm seeing with curl using Solrj.

To be fair the SOLR server I am using is really a workstation class machine, plus I am still learning. I have a feeling I'm doing something dumb but just can't seem to pinpoint the exact problem.


Thanks - Tod


--------code-----------


import java.io.File;
import java.io.IOException;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.request.UpdateRequest;
import org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer;
import org.apache.solr.common.params.ModifiableSolrParams;


/**
 * @author EDaniel
 */
public class SolrExampleTests {

  public static void main(String[] args) {
System.out.println("main...");
    try {
//      String fileName = "/test/test.pdf";
      String fileName = "http://remoteserver/test/test.pdf";;
      String solrId = "1234";
      indexFilesSolrCell(fileName, solrId);

    } catch (Exception ex) {
      System.out.println(ex.toString());
    }
  }

  /**
   * Method to index all types of files into Solr.
   * @param fileName
   * @param solrId
   * @throws IOException
   * @throws SolrServerException
   */
  public static void indexFilesSolrCell(String fileName, String solrId)
    throws IOException, SolrServerException {

System.out.println("indexFilesSolrCell...");

    String urlString = "http://localhost:8080/solr";;

System.out.println("getting connection...");
//    SolrServer solr = new CommonsHttpSolrServer(urlString);
    SolrServer solr = new StreamingUpdateSolrServer(urlString,100,5);

System.out.println("getting updaterequest handle...");
// ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
    UpdateRequest up = new UpdateRequest("/update/extract");

    ModifiableSolrParams params = null ;
    params = new ModifiableSolrParams();
//    params.add("stream.file", fileName);
    params.add("stream.url", fileName);
    params.set("literal.content_id", solrId);
    up.setParams(params);

System.out.println("making request...");
    solr.request(up);

System.out.println("committing...");
    solr.commit();

System.out.println("done...");
  }
}

Reply via email to