Consider keeping your stored/updatable fields in a separate, parallel collection. It makes queries a multi-step operation, but gives you a lot more flexibility.

In some cases (but not all), "external file fields" can eliminate the need to directly update indexed documents.

Or, consider a hybrid NoSql/Solr solution such as DataStax Enterprise, where the data is persisted in Cassandra and indexed in Solr, allowing selective updates of all fields.

See: http://www.datastax.com/

-- Jack Krupansky

-----Original Message----- From: Bram Van Dam
Sent: Monday, July 08, 2013 10:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Atomic updates and indexed fields

see: https://issues.apache.org/jira/browse/LUCENE-4258
I'm sure the people working on this would gladly get all
the help they can. WARNING: I suspect (although I haven't
looked myself) that this is very hairy code <G>.
Ah excellent! Thanks! Exactly what I was looking for. Looks like this
has been in the pipeline for a good while now. I'll have a look over the
patches, and if it's not too hairy I'll see what I can do.
I'll challenge this statement a bit, knowing full well that I don't
understand your problem space just by saying I've seen
some pretty big, high-throughput installations go ahead and
store all the fields and use them for atomic updates. As in
billions of documents. And note that "index size" as it relates
to storing content is orthogonal to searching. By that I mean
the index bloat you get when storing fields doesn't
really impact search memory requirements much, the stored
data is kept in separate files and only assembled for docs
as you return them (i.e. a page worth).
Without going into too much detail about this, I'll say that we have
billions of documents with ~50 indexed fields, fewer than 5 of which
need to be updated, though some documents have to be updated 10 times in
a reasonably short timespan. All the while maintaining an indexing
throughput of ~4k messages/second. Near real time. On COTS hardware.
Every IO-operation we can spare is a major win for us.

Impact on index size is around ~15% in my tests. I will need a little
more time to measure the impact on throughput and querying, but my gut
instinct tells me that it won't be pretty.

- Bram

Reply via email to