Re: exposing per-field storage usage

2022-06-14 Thread Michael Sokolov
OK sorry, I must have misread the timings in the issue you forwarded! Maybe confusing secs with ms or so On Tue, Jun 14, 2022 at 11:43 AM Nhat Nguyen wrote: > > I didn't test with TB indices, but the API took around 100-300ms to analyze a > GB index. > > On Tue, Jun 14, 2022 at 11:15 AM Robert

Re: exposing per-field storage usage

2022-06-14 Thread Nhat Nguyen
I didn't test with TB indices, but the API took around 100-300ms to analyze a GB index. On Tue, Jun 14, 2022 at 11:15 AM Robert Muir wrote: > On Tue, Jun 14, 2022 at 10:37 AM Michael Sokolov > wrote: > > > > Oh, yes that's a clever idea. It seems it would take quite a while > > (tens of

Re: exposing per-field storage usage

2022-06-14 Thread Robert Muir
On Tue, Jun 14, 2022 at 10:37 AM Michael Sokolov wrote: > > Oh, yes that's a clever idea. It seems it would take quite a while > (tens of minutes?) for a larger index though? Much faster than the > force-merge solution for sure. I guess to get faster we would have to > instrument each format. I

Re: exposing per-field storage usage

2022-06-14 Thread Michael Sokolov
Oh, yes that's a clever idea. It seems it would take quite a while (tens of minutes?) for a larger index though? Much faster than the force-merge solution for sure. I guess to get faster we would have to instrument each format. I mean they generally do know how much space each field is occupying,

Re: exposing per-field storage usage

2022-06-13 Thread Nhat Nguyen
> Also, the tool can be much more efficient than checkindex, e.g. for > stored fields and vectors it can just retrieve the first and last > documents, whereas checkindex should verify all of the documents > slowly. Yes, we implemented a similar heuristic in the DiskUsage API in Elasticsearch.

Re: exposing per-field storage usage

2022-06-13 Thread Robert Muir
On Mon, Jun 13, 2022 at 3:26 PM Nhat Nguyen wrote: > > Hi Michael, > > We developed a similar functionality in Elasticsearch. The DiskUsage API > estimates the storage of each field by iterating its structures (i.e., > inverted index, doc-values, stored fields, etc.) and tracking the number of

Re: exposing per-field storage usage

2022-06-13 Thread Atri Sharma
+1 Will really help with visibility. On Tue, 14 Jun 2022, 00:56 Nhat Nguyen, wrote: > Hi Michael, > > We developed a similar functionality in Elasticsearch. The DiskUsage API > estimates the > storage of each field by iterating its

Re: exposing per-field storage usage

2022-06-13 Thread Nhat Nguyen
Hi Michael, We developed a similar functionality in Elasticsearch. The DiskUsage API estimates the storage of each field by iterating its structures (i.e., inverted index, doc-values, stored fields, etc.) and tracking the number of read-bytes.

exposing per-field storage usage

2022-06-13 Thread Michael Sokolov
At Amazon, we have a need to produce regular metrics on how much disk storage is consumed by each field. We manage an index with data contributed by many teams and business units and we are often asked to produce reports attributing index storage usage to these customers. The best tool we have for