Thank you Luke and Reuven for helping me. Now I can see my pipeline processing time for each record.
On Wed, Jun 3, 2020 at 9:25 AM Reuven Lax <re...@google.com> wrote: > Note: you need to tag the timestamp parameter to @ProcessElement with > the @Timestamp annotation. > > On Mon, Jun 1, 2020 at 3:31 PM Luke Cwik <lc...@google.com> wrote: > >> You can configure KafkaIO to use some data from the record as the >> elements timestamp. See the KafkaIO javadoc around the TimestampPolicy[1], >> the default is current processing time. >> You can access the timestamp of the element by adding >> "org.joda.time.Instant timestamp" as a parameter to your @ProcessElement, >> see this javadoc for additional details[2]. You could then compute now() - >> timestamp to calculate processing time. >> >> 1: >> https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/io/kafka/TimestampPolicy.html >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__beam.apache.org_releases_javadoc_2.21.0_org_apache_beam_sdk_io_kafka_TimestampPolicy.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KuUWakZ-xaVGYfsw7YGz1WBOLIlpBHikvRxgZs9vWn0&s=1Sp349fe5C5l4ttxy9iNBlkzoO-9RX_qrvVllkk-PGg&e=> >> 2: >> https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__beam.apache.org_releases_javadoc_2.21.0_org_apache_beam_sdk_transforms_DoFn.ProcessElement.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KuUWakZ-xaVGYfsw7YGz1WBOLIlpBHikvRxgZs9vWn0&s=nkJq_weo7lrd-JzTEw5PeCC-dkivOJ6AlRxLFXwnMMM&e=> >> >> On Mon, Jun 1, 2020 at 2:00 PM Talat Uyarer <tuya...@paloaltonetworks.com> >> wrote: >> >>> Sorry for the late response. Where does the beam set that timestamp >>> field on element ? Is it set whenever KafkaIO reads that element ? >>> >> And also I have a windowing function on my pipeline. Does the timestamp >>> field change for any kind of operation ? On pipeline I have the >>> following steps: KafkaIO -> Format Conversion Pardo -> SQL Filter -> >>> Windowing Step -> Custom Sink. If timestamp set in KafkaIO, Can I see >>> process time by now() - timestamp in Custom Sink ? >>> >>> >> Thanks >>> >>> On Thu, May 28, 2020 at 2:07 PM Luke Cwik <lc...@google.com> wrote: >>> >>>> Dataflow provides msec counters for each transform that executes. You >>>> should be able to get them from stackdriver and see them from the Dataflow >>>> UI. >>>> >>>> You need to keep track of the timestamp of the element as it flows >>>> through the system as part of data that goes alongside the element. You can >>>> use the element's timestamp[1] if that makes sense (it might not if you >>>> intend to use a timestamp that is from the kafka record itself and the >>>> record's timestamp isn't the same as the ingestion timestamp). Unless you >>>> are writing your own sink, the sink won't track the processing time at all >>>> so you'll need to add a ParDo that goes right before it that writes the >>>> timing information to wherever you want (a counter, your own metrics >>>> database, logs, ...). >>>> >>>> 1: >>>> https://github.com/apache/beam/blob/018e889829e300ab9f321da7e0010ff0011a73b1/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L257 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_018e889829e300ab9f321da7e0010ff0011a73b1_sdks_java_core_src_main_java_org_apache_beam_sdk_transforms_DoFn.java-23L257&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=1202mTv7BP1KzcBJECS98dr7u5riw0NHdl8rT8I6Ego&s=cPdnrK4r-tVd0iAO6j7eAAbDPISOdazEYBrPoC9cQOo&e=> >>>> >>>> >>>> On Thu, May 28, 2020 at 1:12 PM Talat Uyarer < >>>> tuya...@paloaltonetworks.com> wrote: >>>> >>>>> Yes I am trying to track how long it takes for a single element to be >>>>> ingested into the pipeline until it is output somewhere. >>>>> >>>>> My pipeline is unbounded. I am using KafkaIO. I did not think about >>>>> CPU time. if there is a way to track it too, it would be useful to improve >>>>> my metrics. >>>>> >>>>> On Thu, May 28, 2020 at 12:52 PM Luke Cwik <lc...@google.com> wrote: >>>>> >>>>>> What do you mean by processing time? >>>>>> >>>>>> Are you trying to track how long it takes for a single element to be >>>>>> ingested into the pipeline until it is output somewhere? >>>>>> Do you have a bounded pipeline and want to know how long all the >>>>>> processing takes? >>>>>> Do you care about how much CPU time is being consumed in aggregate >>>>>> for all the processing that your pipeline is doing? >>>>>> >>>>>> >>>>>> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer < >>>>>> tuya...@paloaltonetworks.com> wrote: >>>>>> >>>>>>> I am using Dataflow Runner. The pipeline read from kafkaIO and send >>>>>>> Http. I could not find any metadata field on the element to set first >>>>>>> read >>>>>>> time. >>>>>>> >>>>>>> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver <kcwea...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Which runner are you using? >>>>>>>> >>>>>>>> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer < >>>>>>>> tuya...@paloaltonetworks.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have a pipeline which has 5 steps. What is the best way to >>>>>>>>> measure processing time for my pipeline? >>>>>>>>> >>>>>>>>> Thnaks >>>>>>>>> >>>>>>>>