We've decided to store the original document in both Solr and external
repositories. This is to support the following:

   1. highlighting - We need to mark-up the entire document with hit-terms.
   However if this was the only reason to store the text I'd seriously consider
   calling out to the external repository via a custom highlighter.
   2. "hot" documents - We need to index user-generated data like activity
   streams, folksonomy tags, annotations, and comments. When our indexer is
   made aware of those events we decorate the existing SolrDocument with new
   fields and re-index it.
   3. in-place index rebuild - Our search service is still evolving so we
   periodically change our schema and indexing code. We believe it's more
   efficient, not to mention faster, to rebuild the index if we've got all the
   data.

Hope that helps!

On Fri, May 13, 2011 at 3:10 PM, Mike Sokolov <soko...@ifactory.com> wrote:

> Would anyone care to comment on the merits of storing indexed full-text
> documents in Solr versus storing them externally?
>
> It seems there are three options for us:
>
> 1) store documents both in Solr and externally - this is what we are doing
> now, and gives us all sorts of flexibility, but doesn't seem like the most
> scalable option, at least in terms of storage space and I/O required when
> updating/inserting documents.
>
> 2) store documents externally: For the moment, the only thing that requires
> us to store documents in Solr is the need to highlight them, both in search
> result snippets and in full document views. We are considering hunting for
> or writing a Highlighter extension that could pull in the document text from
> an external source (eg filesystem).
>
> 3) store documents only in Solr.  We'd just retrieve document text as a
> Solr field value rather than reading from the filesystem.  Somehow this
> strikes me as the wrong thing to do, but it could work:  I'm not sure why.
>  A lot of unnecessary merging I/O activity perhaps.  Makes it hard to grep
> the documents or use other filesystem tools, I suppose.
>
> Which one of these sounds best to you?  Under which circumstances? Are
> there other possibilities?
>
> Thanks!
>
> --
>
> Michael Sokolov
> Engineering Director
> www.ifactory.com
>
>

Reply via email to