Re: DocValues space usage

2013-04-09 Thread Wei Wang
Adrien and Rober, thanks a lot for the hints. Will try a few options and see how it goes. On Tue, Apr 9, 2013 at 9:25 AM, Robert Muir wrote: > On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand wrote: > > > The default codec stores numeric doc values by blocks of 4096 values > > that have independent

Re: DocValues space usage

2013-04-09 Thread Robert Muir
On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand wrote: > The default codec stores numeric doc values by blocks of 4096 values > that have independent numbers of bits per values. If you end up having > most of these blocks empty, doc values will require little space but > in a worst-case scenario whe

Re: DocValues space usage

2013-04-09 Thread Robert Muir
On Tue, Apr 9, 2013 at 9:06 AM, Wei Wang wrote: > Thanks for the hint. Could you point to some Codec that might do this for > some types, even just as an side effect as you mentioned? It will be > helpful to have something to start with. > Have a look at diskdv/ codec in the codecs/ module. Its

Re: DocValues space usage

2013-04-09 Thread Adrien Grand
Hi, On Tue, Apr 9, 2013 at 5:22 PM, Wei Wang wrote: > DocValues makes fast per doc value lookup possible, which is nice. But it > brings other interesting issues. > > Assume there are 100M docs and 200 NumericDocValuesFields, this ends up > with huge number of disk and memory usage, even if there

Re: DocValues space usage

2013-04-09 Thread Wei Wang
Thanks for the hint. Could you point to some Codec that might do this for some types, even just as an side effect as you mentioned? It will be helpful to have something to start with. And could you elaborate a bit more for "the facet on tons of sparse fields"? I just got a vague idea from the comm

Re: DocValues space usage

2013-04-09 Thread Robert Muir
On Tue, Apr 9, 2013 at 8:22 AM, Wei Wang wrote: > DocValues makes fast per doc value lookup possible, which is nice. But it > brings other interesting issues. > > Assume there are 100M docs and 200 NumericDocValuesFields, this ends up > with huge number of disk and memory usage, even if there are

DocValues space usage

2013-04-09 Thread Wei Wang
DocValues makes fast per doc value lookup possible, which is nice. But it brings other interesting issues. Assume there are 100M docs and 200 NumericDocValuesFields, this ends up with huge number of disk and memory usage, even if there are just thousands of values for each field. I guess this is b