Re: Metadata and FullText, indexed at different times - looking for best approach

Erick Erickson Sun, 15 Jul 2012 09:08:54 -0700

You've got a couple of choices. There's a new patch in town
https://issues.apache.org/jira/browse/SOLR-139
that allows you to update individual fields in a doc if (and only if)
all the fields in the original document were stored (actually, all the
non-copy fields).


So if you're storing (stored="true") all your metadata information, you can
just update the document when the  text becomes available assuming you
know the uniqueKey when you update.

Under the covers, this will find the old document, get all the fields, add the
new fields to it, and re-index the whole thing.

Otherwise, your fallback idea is a good one.

Best
Erick

On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> Hello,
>
> I have a database of metadata and I can inject it into SOLR with DIH
> just fine. But then, I also have the documents to extract full text
> from that I want to add to the same records as additional fields. I
> think DIH allows to run Tika at the ingestion time, but I may not have
> the full-text files at that point (they could arrive days later). I
> can match the file to the metadata by a file name matching a field
> name.
>
> What is the best approach to do that staggered indexing with minimum
> custom code? I guess my fallback position is a custom full-text
> indexer agent that re-adds the metadata fields when the file is being
> indexed. Is there anything better?
>
> I am a newbie using v4.0alpha of SOLR (and loving it).
>
> Thank you,
>     Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)

Re: Metadata and FullText, indexed at different times - looking for best approach

Reply via email to