: I've been trying to use the UUIDField in solr to maintain ids of the 
: pages I've crawled with nutch (as per 
: http://wiki.apache.org/solr/UniqueKey). The use case is that I want to 
: have the server able to use these ids in another database for various 
: statistics gathering. So I want the link url to act like a primary key 
: for determining if a page exists, and if it doesn't exist to generate a 
: new uuid.

i'm confused ... if you want the URL to be the primary key, then use the 
URL as the primary key, why use the UUID Field at all?

:     2. Looking at the code for UUIDField (relevant bit pasted below), it 
: seems that the UUID is just generated randomly. There is no check if the 
: generated UUID has already been used.

correct ... if you specify "NEW" then it generates a new UUID for you -- 
if you wnat to update an existing doc with an existing UUID then you need 
to send the real, existing, value of the UUID for the doc you are 
updating.

: I can sort of solve this problem by generating the UUID myself, as a 
: hash of the link url, but that doesn't help me for those random cases 
: when the hash might happen to generate the same UUID.
: 
: Does anyone know if there is a way for solr to only add a uuid if the 
: document doesn't already exist?

I don't really understand your second sentence, but based on that first 
sentence it sounds like what you want may be to use something like the 
SignatureUpdateProcessor to generate a hash based on the URL...

https://wiki.apache.org/solr/Deduplication

-Hoss

Reply via email to