I'm kinda out of the looop on the whole Stax/Xpp/Xml update parsing stuff
... am i remembering correctly the end game goal is to reduce/eliminate
dependencies on XPP?  (because  .... ? .... stax is Java "standard"
included out-of-the-box with java6? (i'm guessing))


For me the biggest reason is to de-couple the parsing from the actual update processing. I need to do custom processing in between (SOLR-269). Stax is a growing standard, so it seems like the right choice if we are reworking document parsing. (depending on your preference) It is a bit easier to work with and more readable.

With the parsing separated from indexing, it would be straightforward to have a single UpdateRequestHandler that could read the content type and pick how to parse the documents - using the same indexing strategies/format/processor etc.


A lot of people seem to be sending multiple documents at a time as well,
so we should test that use case (ie: <add> containing 10000 small
documents; <add> containg 100 medium documents; <add> containing 1 big
document)


that makes sense. I don't claim the tests I ran are representative - i just wanted to make sure the overall speeds are within the same ballpark.

this one sends 10000 docs together (with 10 text fields), then 10000 docs individually each with 100 text fields. Still not the most scientific, but here it is:

STAX: 57642
XPP: 58012


  @Override public void setUp() throws Exception
  {
    super.setUp();

    // setup the server...
    server = new EmbeddedSolrServer( SolrCore.getSolrCore() );
  }

  public SolrInputDocument createDocument( int id, int fcnt )
  {
    SolrInputDocument doc = new SolrInputDocument();
    doc.addField( "id", id+"" );
    doc.addField( "name", "hello" );
    for( int x=5; x<fcnt; x++ ) {
doc.addField( "text", "this is just some & text with <> asgasdg; "+x );
    }
    return doc;
  }

  public long makeRequests( String path, int cnt ) throws Exception
  {
    server.deleteByQuery( "*:*" );// delete everything!
    server.optimize();

    long now = System.currentTimeMillis();
    UpdateRequest req = new UpdateRequest();
    req.setPath( path );

    // Send all the docs together
    for( int i=0; i<cnt; i++ ) {
      req.add( createDocument( i , 10 ) );
    }
    server.request( req );
    req.clear();

    // Send them one at a time
    for( int i=0; i<cnt; i++ ) {
      req.add( createDocument( i+cnt, 100 ) );
      server.request( req );
      req.clear();
    }
    server.commit();
    long elapsed = System.currentTimeMillis() - now;

    QueryResponse response = server.query( new SolrQuery( "*:*" ) );
    if( (cnt*2) != response.getResults().getNumFound() ) {
      throw new Exception( "did not add everything!" );
    }
    return elapsed;
  }


  /**
   * query the example
   */
  public void testExampleConfig() throws Exception
  {
    // Empty the database...
    long time = makeRequests( "/update", 10000 );
    System.out.println( "time: " + time);
  }


Reply via email to