You've got a couple of choices. There's a new patch in town https://issues.apache.org/jira/browse/SOLR-139 that allows you to update individual fields in a doc if (and only if) all the fields in the original document were stored (actually, all the non-copy fields).
So if you're storing (stored="true") all your metadata information, you can just update the document when the text becomes available assuming you know the uniqueKey when you update. Under the covers, this will find the old document, get all the fields, add the new fields to it, and re-index the whole thing. Otherwise, your fallback idea is a good one. Best Erick On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Hello, > > I have a database of metadata and I can inject it into SOLR with DIH > just fine. But then, I also have the documents to extract full text > from that I want to add to the same records as additional fields. I > think DIH allows to run Tika at the ingestion time, but I may not have > the full-text files at that point (they could arrive days later). I > can match the file to the metadata by a file name matching a field > name. > > What is the best approach to do that staggered indexing with minimum > custom code? I guess my fallback position is a custom full-text > indexer agent that re-adds the metadata fields when the file is being > indexed. Is there anything better? > > I am a newbie using v4.0alpha of SOLR (and loving it). > > Thank you, > Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book)