Hi, 

I've been trying to use the UUIDField in solr to maintain ids of the pages I've 
crawled with nutch (as per http://wiki.apache.org/solr/UniqueKey). The use case 
is that I want to have the server able to use these ids in another database for 
various statistics gathering. So I want the link url to act like a primary key 
for determining if a page exists, and if it doesn't exist to generate a new 
uuid.

I've run into two problems with this:

    1. If I use the UUIDField class with default="NEW", then when a page is 
crawled again, and the solr system is told to update the document, the UUID 
changes. 

    2. Looking at the code for UUIDField (relevant bit pasted below), it seems 
that the UUID is just generated randomly. There is no check if the generated 
UUID has already been used. 

I can sort of solve this problem by generating the UUID myself, as a hash of 
the link url, but that doesn't help me for those random cases when the hash 
might happen to generate the same UUID.

Does anyone know if there is a way for solr to only add a uuid if the document 
doesn't already exist? 

Thanks!
Blaise


------------------------------------------------------------
http://javasourcecode.org/html/open-source/solr/solr-3.3.0/org/apache/solr/schema/UUIDField.java.html

  /**
   * Generates a UUID if val is either null, empty or "NEW".
   * 
   * Otherwise it behaves much like a StrField but checks that the value given
   * is indeed a valid UUID.
   * 
   * @param val The value of the field
   * @see org.apache.solr.schema.FieldType#toInternal(java.lang.String)
   */
  @Override
  public String toInternal(String val) {
    if (val == null || 0==val.length() || NEW.equals(val)) {
      return UUID.randomUUID().toString().toLowerCase(Locale.ENGLISH);
    } else {
      // we do some basic validation if 'val' looks like an UUID
      if (val.length() != 36 || val.charAt(8) != DASH || val.charAt(13) != DASH
          || val.charAt(18) != DASH || val.charAt(23) != DASH) {
        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
            "Invalid UUID String: '" + val + "'");
      }

      return val.toLowerCase(Locale.ENGLISH);
    }
  }


Reply via email to