Hi folks,

Thanks for the reply.
We have implemented our own SplitAssigner, FileReaderFormat and
FileReaderFormat.Reader implementations. Hence, we plan to add custom
metrics such as these:
1. No. of splits SplitAssigner is initialized with, number of splits
re-added back to the SplitAssigner
2. Readers created per unit time
3. Time taken to create a reader
4. Time taken for the Reader to produce a single Row
5. Readers closed per unit time
... and some more

However, since we haven't implemented our own FileSource or
SplitEnumerator, we don't have visibility into the metrics of these
components. We would ideally like to measure these:
1. Number of rows emitted by the source per unit time
2. Time taken by the enumerator to discover the splits
3. Total splits discovered


Regards,
Meghajit


On Fri, Jun 10, 2022 at 10:04 PM Jing Ge <j...@ververica.com> wrote:

> Hi meghajit,
>
> I think it makes sense to extend the current metrics. Could you list all
> metrics you need? Thanks!
>
> Best regards,
> Jing
>
> On Fri, Jun 10, 2022 at 5:06 PM Lijie Wang <wangdachui9...@gmail.com>
> wrote:
>
>> Hi Meghajit,
>>
>> As far as I know, currently, the FileSource does not have the metrics you
>> need.  You can implement your own source, and register custom metrics via
>> `SplitEnumeratorContext#metricGroup` and `SourceReaderContext#metricGroup`.
>>
>> Best,
>> Lijie
>>
>> Meghajit Mazumdar <meghajit.mazum...@gojek.com> 于2022年6月10日周五 16:36写道:
>>
>>> Hello,
>>>
>>> We are working on a Flink project which uses FileSource to discover and
>>> read Parquet Files from GCS. ( using Flink 1.14)
>>>
>>> As part of this, we wanted to implement some health metrics around the
>>> code.
>>> I wanted to know whether Flink gathers some metrics by itself around
>>> FileSource, e;g, number of files discovered by the SplitEnumerator, number
>>> of files added back to SplitAssigner, time taken to process per split, etc ?
>>>
>>> I checked in the official documentation
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/filesystem/>
>>> but there doesn't appear to be. Is the solution then to implement
>>> custom metrics like this
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/metrics/>
>>> ?
>>>
>>>
>>> *Regards,*
>>> *Meghajit*
>>>
>>

-- 
*Regards,*
*Meghajit*

Reply via email to