Re: stax vs xpp XmlUpdateHandler

Ryan McKinley Fri, 29 Jun 2007 17:50:32 -0700


I'm kinda out of the looop on the whole Stax/Xpp/Xml update parsing stuff
... am i remembering correctly the end game goal is to reduce/eliminate
dependencies on XPP?  (because  .... ? .... stax is Java "standard"
included out-of-the-box with java6? (i'm guessing))

For me the biggest reason is to de-couple the parsing from the actualupdate processing. I need to do custom processing in between(SOLR-269). Stax is a growing standard, so it seems like the rightchoice if we are reworking document parsing. (depending on yourpreference) It is a bit easier to work with and more readable.

With the parsing separated from indexing, it would be straightforward tohave a single UpdateRequestHandler that could read the content type andpick how to parse the documents - using the same indexingstrategies/format/processor etc.


A lot of people seem to be sending multiple documents at a time as well,
so we should test that use case (ie: <add> containing 10000 small
documents; <add> containg 100 medium documents; <add> containing 1 big
document)

that makes sense. I don't claim the tests I ran are representative - ijust wanted to make sure the overall speeds are within the same ballpark.

this one sends 10000 docs together (with 10 text fields), then 10000docs individually each with 100 text fields. Still not the mostscientific, but here it is:


STAX: 57642
XPP: 58012


  @Override public void setUp() throws Exception
  {
    super.setUp();

    // setup the server...
    server = new EmbeddedSolrServer( SolrCore.getSolrCore() );
  }

  public SolrInputDocument createDocument( int id, int fcnt )
  {
    SolrInputDocument doc = new SolrInputDocument();
    doc.addField( "id", id+"" );
    doc.addField( "name", "hello" );
    for( int x=5; x<fcnt; x++ ) {

doc.addField( "text", "this is just some & text with <> asgasdg;"+x );

    }
    return doc;
  }

  public long makeRequests( String path, int cnt ) throws Exception
  {
    server.deleteByQuery( "*:*" );// delete everything!
    server.optimize();

    long now = System.currentTimeMillis();
    UpdateRequest req = new UpdateRequest();
    req.setPath( path );

    // Send all the docs together
    for( int i=0; i<cnt; i++ ) {
      req.add( createDocument( i , 10 ) );
    }
    server.request( req );
    req.clear();

    // Send them one at a time
    for( int i=0; i<cnt; i++ ) {
      req.add( createDocument( i+cnt, 100 ) );
      server.request( req );
      req.clear();
    }
    server.commit();
    long elapsed = System.currentTimeMillis() - now;

    QueryResponse response = server.query( new SolrQuery( "*:*" ) );
    if( (cnt*2) != response.getResults().getNumFound() ) {
      throw new Exception( "did not add everything!" );
    }
    return elapsed;
  }


  /**
   * query the example
   */
  public void testExampleConfig() throws Exception
  {
    // Empty the database...
    long time = makeRequests( "/update", 10000 );
    System.out.println( "time: " + time);
  }

Re: stax vs xpp XmlUpdateHandler

Reply via email to