bq: Are there any such structures? Well, I thought there were, but I've got to admit I can't call any to mind immediately.
bq: 2b is just the hard limit Yeah, I'm always a little nervous as to when Moore's Law will make everything I know about current systems' performance obsolete. At any rate, I _can_ say with certainty that I have no interest at this point in exceeding this limit. Of course that may change with compelling use-cases ;).... Best, Erick On Wed, Feb 11, 2015 at 4:14 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Erick Erickson [erickerick...@gmail.com] wrote: > >> I guess my $0.02 is that you'd have to have strong evidence that extending >> Lucene to 64 bit is even useful. Or more generally, useful enough to pay the >> penalty. All the structures that allocate maxDoc id arrays would suddenly >> require twice the memory for instance, > > Are there any such structures? It was my impressions that ID-structures in > Solr were either bitmaps, hashmaps or queues. Anyway, if the number of places > with full-size ID-arrays is low, there could be dual implementations selected > by maxDoc. > >> plus all the coding effort that could be spend doing other things. > > Very true. I agree that at the current stage, > 2b/shard is still a bit too > special to spend a lot of effort on it. > > However, 2b is just the hard limit. As has been discussed before, single > shards works best in the lower end of the hundreds of millions of documents. > One reason is that many parts of Lucene works single-threaded on structures > that scale linear to document count. Having some hundreds of millions of > documents (log analysis being the typical case) is not uncommon these days. A > gradual shift to more multi-thread oriented processing would fit well with > current trends in hardware as well as use cases. As opposed to the int->long > switch, there would be little to no penalty for setups with low maxDocs (they > would just use 1 thread). > > - Toke Eskildsen