Thanks for the responses.  This is exactly what I had to resort to.  I will
definitely put in a feature request to get the generated ID back from the
extract request.

I am doing this with PHP cURL for extraction and pecl php solr for
querying.  I am then saving the unique id and dupe hash in a MySQL table
which I check against after the doc is indexed in Solr.  If it is a dupe I
delete the Solr record and discard the file.  My problem now is the dupe
hash sometimes comes back NULL from Solr although when I check it through
Solr Admin it is there.  I am working through this now to isolate.

I had to set Solr to ALLOW duplicates because I have to somehow know that
the file is a dupe and then remove the duplicate files on my filesystem.
Based on the extract response I have no way of knowing this if duplicates
are disallowed.

-Bill


On Tue, Mar 2, 2010 at 2:11 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote:

>
>
> : To quote from the wiki,
>        ...
> That's all true ... but Bill explicitly said he wanted to use
> SignatureUpdateProcessorFactory to generate a uniqueKey from the content
> field post-extraction so he could dedup documents with the same content
> ... his question was how to get that key after adding a doc.
>
> Using a unique literal.field value will work -- but only as the value of
> a secondary field that he can then query on to get the uniqueKeyField
> value.
>
>
> : > : You could create your own unique ID and pass it in with the
> : > : literal.field=value feature.
> : >
> : > By which Lance means you could specify an unique value in a differnet
> : > field from yoru uniqueKey field, and then query on that field:value
> pair
> : > to get the doc after it's been added -- but that query will only work
> : > until some other version of the doc (with some other value) overwrites
> it.
> : > so you'd esentially have to query for the field:value to lookup the
> : > uniqueKey.
> : >
> : > it seems like it should definitely be feasible for the
> : > Update RequestHandlers to return the uniqueKeyField values for all the
> : > added docs (regardless of wether the key was included in the request,
> or
> : > added by an UpdateProcessor -- but i'm not sure how that would fit in
> with
> : > the SolrJ API.
> : >
> : > would you mind opening a feature request in Jira?
> : >
> : >
> : >
> : > -Hoss
> : >
> : >
> :
> :
> :
> : --
> : Lance Norskog
> : goks...@gmail.com
> :
>
>
>
> -Hoss
>
>

Reply via email to