Re: exposing per-field storage usage

2022-06-13 Thread Nhat Nguyen
> Also, the tool can be much more efficient than checkindex, e.g. for > stored fields and vectors it can just retrieve the first and last > documents, whereas checkindex should verify all of the documents > slowly. Yes, we implemented a similar heuristic in the DiskUsage API in Elasticsearch. On

Re: exposing per-field storage usage

2022-06-13 Thread Robert Muir
On Mon, Jun 13, 2022 at 3:26 PM Nhat Nguyen wrote: > > Hi Michael, > > We developed a similar functionality in Elasticsearch. The DiskUsage API > estimates the storage of each field by iterating its structures (i.e., > inverted index, doc-values, stored fields, etc.) and tracking the number of

Re: [VOTE] Release Lucene/Solr 8.11.2 RC2

2022-06-13 Thread Tomás Fernández Löbbe
+1 SUCCESS! [1:02:16.559513] On Mon, Jun 13, 2022 at 12:07 PM Mike Drob wrote: > Please vote for release candidate 2 for Lucene/Solr 8.11.2 > > The artifacts can be downloaded from: > > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.2-RC2-rev17dee71932c683e345508113523e764c3e4c8

Re: exposing per-field storage usage

2022-06-13 Thread Atri Sharma
+1 Will really help with visibility. On Tue, 14 Jun 2022, 00:56 Nhat Nguyen, wrote: > Hi Michael, > > We developed a similar functionality in Elasticsearch. The DiskUsage API > estimates the > storage of each field by iterating its structure

Re: exposing per-field storage usage

2022-06-13 Thread Nhat Nguyen
Hi Michael, We developed a similar functionality in Elasticsearch. The DiskUsage API estimates the storage of each field by iterating its structures (i.e., inverted index, doc-values, stored fields, etc.) and tracking the number of read-bytes.

[VOTE] Release Lucene/Solr 8.11.2 RC2

2022-06-13 Thread Mike Drob
Please vote for release candidate 2 for Lucene/Solr 8.11.2 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.2-RC2-rev17dee71932c683e345508113523e764c3e4c80fa You can run the smoke tester directly with this command: python3 -u dev-tools/scripts/

Re: [VOTE] Release Lucene/Solr 8.11.2 RC1

2022-06-13 Thread Mike Drob
This RC did not receive enough votes to pass, I've fixed the bug pointed out by Houston and will be moving on to RC2. Thanks! On Sun, Jun 12, 2022 at 2:57 PM Mike Drob wrote: > Thanks for finding that, Houston! It was an issue during backporting that > I've corrected. I'll respin and put up a ne

exposing per-field storage usage

2022-06-13 Thread Michael Sokolov
At Amazon, we have a need to produce regular metrics on how much disk storage is consumed by each field. We manage an index with data contributed by many teams and business units and we are often asked to produce reports attributing index storage usage to these customers. The best tool we have for

Re: 30% query performance degradation for documents with small stored fields

2022-06-13 Thread Adrien Grand
> Is my understanding correct that changing only block size and disabling preset dictionaries are the changes that won't likely require re-indexing and could be as easily carried over to the next Lucene versions? I understand there is no guarantee, but curious to know your opinion because it introd

JDK 19: Rampdown Phase 1 + EA builds 26 & JDK 20: EA builds 1

2022-06-13 Thread David Delabassee
Greetings! JDK 19 has now entered Rampdown Phase One (RDP1) [1], which means that the main-line has been forked into a dedicated JDK 19 stabilization repository. At this point, the overall JDK 19 feature set is frozen and no additional JEPs will be targeted to JDK 19. The stabilization reposi