Re: ram estimate for docvalues is incorrect

2020-05-28 Thread John Wang
Thanks David for the info! -John On Wed, May 27, 2020 at 8:03 PM David Smiley wrote: > John: you may benefit from more eagerly merging small segments on commit. > At Salesforce we have a *ton* of indexes, and we reduced the segment count > in half from the default. The large number of fields

Re: ram estimate for docvalues is incorrect

2020-05-28 Thread John Wang
Thank you! Sounds like it is a bad idea to rely on Accountable, the best path forward is for us to rethink how to manage our cache. -John On Thu, May 28, 2020 at 7:02 AM Adrien Grand wrote: > I opened https://issues.apache.org/jira/browse/LUCENE-9387. > > On Thu, May 28, 2020 at 2:41 PM

Re: ram estimate for docvalues is incorrect

2020-05-28 Thread Adrien Grand
I opened https://issues.apache.org/jira/browse/LUCENE-9387. On Thu, May 28, 2020 at 2:41 PM Michael McCandless < luc...@mikemccandless.com> wrote: > +1 to remove Accountable from Lucene's reader classes. Let's open an > issue and discuss there? > > In the past, when we added Accountable,

Re: ram estimate for docvalues is incorrect

2020-05-28 Thread Michael McCandless
+1 to remove Accountable from Lucene's reader classes. Let's open an issue and discuss there? In the past, when we added Accountable, Lucene's Codec/LeafReaders used quite a bit of heap, and the implementation was much closer to correct (as measured by %tg difference). But now that we've moved

Re: ram estimate for docvalues is incorrect

2020-05-28 Thread Adrien Grand
To be clear, there is no plan to remove RAM accounting from readers yet, this is just something that I have been thinking about recently, so your use-case caught my attention. Given how low the memory usage is nowadays, I believe that it would be extremely hard to make sure that RAM estimates are

Re: ram estimate for docvalues is incorrect

2020-05-27 Thread David Smiley
John: you may benefit from more eagerly merging small segments on commit. At Salesforce we have a *ton* of indexes, and we reduced the segment count in half from the default. The large number of fields was a positive factor in this being a desirable trade-off. You might look at this recent issue

Re: ram estimate for docvalues is incorrect

2020-05-27 Thread John Wang
Thanks Adrien! It is surprising to learn this is an invalid use case and that Lucene is planning to get rid of memory accounting... In our test, there are indeed many fields. From our test, with 1000 numeric doc values fields, and 5 million docs in 1 segment. (We will have many segments in our

Re: ram estimate for docvalues is incorrect

2020-05-27 Thread Adrien Grand
A couple major versions ago, Lucene required tons of heap memory to keep a reader open, e.g. norms were on heap and so on. To my knowledge, the only thing that is now kept in memory and is a function of maxDoc is live docs, all other codec components require very little memory. I'm actually

ram estimate for docvalues is incorrect

2020-05-27 Thread John Wang
Hello, We have a reader cache that depends on the memory usage for each reader. We found the calculation of reader size for doc values to be under counting. See line: