Thanks for the responses. This is exactly what I had to resort to. I will
definitely put in a feature request to get the generated ID back from the
extract request.
I am doing this with PHP cURL for extraction and pecl php solr for
querying. I am then saving the unique id and dupe hash in a MyS
: To quote from the wiki,
...
That's all true ... but Bill explicitly said he wanted to use
SignatureUpdateProcessorFactory to generate a uniqueKey from the content
field post-extraction so he could dedup documents with the same content
... his question was how to get that key after ad
To quote from the wiki,
http://wiki.apache.org/solr/ExtractingRequestHandler
curl 'http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true'
-F "myfi...@tutorial.html"
This runs the extractor on your input file (in this case an HTML
file). It then stores the generated document with t
: You could create your own unique ID and pass it in with the
: literal.field=value feature.
By which Lance means you could specify an unique value in a differnet
field from yoru uniqueKey field, and then query on that field:value pair
to get the doc after it's been added -- but that query will
You could create your own unique ID and pass it in with the
literal.field=value feature.
http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters
On Fri, Feb 26, 2010 at 7:56 AM, Bill Engle wrote:
> Any thoughts on this? I would like to get the id back in the request after
> indexin
Any thoughts on this? I would like to get the id back in the request after
indexing. My initial thoughts were to do a search to get the docid based
on the attr_stream_name after indexing but now that I reread my message I
mentioned the attr_stream_name (file_name) may be different so that is
unre
Hi -
New Solr user here. I am using Solr Cell to index files (PDF, doc, docx,
txt, htm, etc.) and there is a good chance that a new file will have
duplicate content but not necessarily the same file name. To avoid this I
am using the deduplication feature of Solr.
true
id