Sahil Takiar created IMPALA-8544:
------------------------------------

             Summary: Expose additional S3A / S3Guard metrics
                 Key: IMPALA-8544
                 URL: https://issues.apache.org/jira/browse/IMPALA-8544
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar


S3A / S3Guard internally collects several useful metrics that we should 
consider exposing to Impala users. The full list of statistics can be found in 
{{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations 
performed (put, get, etc.), invocation counts for various {{FileSystem}} 
methods, stream statistics (bytes read, written, etc.), etc.

Some interesting stats that stand out:
 * "stream_aborted": "Count of times the TCP stream was aborted" - the number 
of TCP connection aborts, a high value would indicate performance issues
 * "stream_read_exceptions" : "Number of exceptions invoked on input streams" - 
incremented whenever an {{IOException}} is caught while reading (these 
exception don't always get propagated to Impala because they trigger a retry)
 * "store_io_throttled": "Requests throttled and retried" - looks like it 
tracks the number of times the fs retries an operation because the original 
request hit a throttling exception
 * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - looks 
like it tracks the number of times the fs retries S3Guard operations
 * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled 
events" - similar to "store_io_throttled" but looks like it is specific to 
S3Guard

We should consider how to expose these metrics via Impala logs / runtime 
profiles.

There are a few options:
 * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard 
via the {{FileSystem#getStorageStatistics}} method; the 
{{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, 
however, I think the stats might be aggregated globally, which would make it 
hard to create per-query specific metrics
 * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it is 
per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} 
extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some API 
(haven't looked into this yet)
 * {{S3AInputStream#toString}} dumps the statistics from 
{{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and 
{{S3AFileSystem#toString}} dumps them all as well
 * {{S3AFileSystem}} updates the stats in 
{{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, 
etc.)

Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared across 
threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to