Robert.

Thank you for the information.
I downloaded apache beam using pip3 so beam environment is not a problem in
my case.

The solution looks fantastic | I love it.

Below is output from my sample code.

2019-07-25 10:33:18,722 INFO Test started at 2019-07-25 10:33:18
2019-07-25 10:33:19,780 INFO Filesystem() done. Elapsed time 1.058 ms.
2019-07-25 10:33:31,479 INFO ==================== <function
annotate_downstream_side_inputs at 0x7f69909a0ae8> ====================
2019-07-25 10:33:31,479 INFO ==================== <function
fix_side_input_pcoll_coders at 0x7f69909a0bf8> ====================
2019-07-25 10:33:31,480 INFO ==================== <function lift_combiners
at 0x7f69909a0c80> ====================
2019-07-25 10:33:31,480 INFO ==================== <function expand_sdf at
0x7f69909a0d08> ====================
2019-07-25 10:33:31,481 INFO ==================== <function expand_gbk at
0x7f69909a0d90> ====================
2019-07-25 10:33:31,481 INFO ==================== <function sink_flattens
at 0x7f69909a0ea0> ====================
2019-07-25 10:33:31,482 INFO ==================== <function greedily_fuse
at 0x7f69909a0f28> ====================
2019-07-25 10:33:31,482 INFO ==================== <function read_to_impulse
at 0x7f69909a5048> ====================
2019-07-25 10:33:31,483 INFO ==================== <function
impulse_to_input at 0x7f69909a50d0> ====================
2019-07-25 10:33:31,484 INFO ==================== <function
inject_timer_pcollections at 0x7f69909a5268> ====================
2019-07-25 10:33:31,484 INFO ==================== <function sort_stages at
0x7f69909a52f0> ====================
2019-07-25 10:33:31,484 INFO ==================== <function
window_pcollection_coders at 0x7f69909a5378> ====================
2019-07-25 10:33:31,485 INFO Running
((ref_AppliedPTransform_Create/Read_3)+(ref_AppliedPTransform_FlatMap(<lambda
at slow-beam.py:51>)_4))+(ref_AppliedPTransform_ParDo(OpenFn)_5)
2019-07-25 10:33:31,485 INFO Running
((ref_AppliedPTransform_Create/Read_3)+(ref_AppliedPTransform_FlatMap(<lambda
at slow-beam.py:51>)_4))+(ref_AppliedPTransform_ParDo(OpenFn)_5)
{'((ref_AppliedPTransform_Create/Read_3)+(ref_AppliedPTransform_FlatMap(<lambda
at slow-beam.py:51>)_4))+(ref_AppliedPTransform_ParDo(OpenFn)_5)':
ptransforms {
  key: "Create/Read"
  value {
    processed_elements {
      measured {
        output_element_counts {
          key: "out"
          value: 1
        }
        total_time_spent: 0.330032672
      }
    }
  }
}
<TRIMMED>

Thanks,
Yu

On Thu, Jul 25, 2019 at 12:26 AM Robert Bradshaw <rober...@google.com>
wrote:

> Also, take note that these counters will only be available if Beam has
> been compiled with Cython ( e.g. installed from a wheel). Of course if you
> care about performance you'd want that anyway.
>
> On Wed, Jul 24, 2019, 5:15 PM Robert Bradshaw <rober...@google.com> wrote:
>
>> Beam tracks the amount of time spent in each transform in profile
>> counters. There is ongoing work to expose these in a uniform way for all
>> runners (e.g. in Dataflow they're displayed on the UI), but for the direct
>> runner you can see an example at
>> https://github.com/apache/beam/blob/release-2.14.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L1046
>>  .
>> For a raw dump you could do something like:
>>
>>     p = beam.Pipeline(...)
>>     p | beam.Read...
>>     results = p.run()
>>     results.wait_until_finish()
>>     import pprint
>>     pprint.pprint(results._metrics_by_stage)
>>
>>
>>
>>
>> On Wed, Jul 24, 2019 at 4:07 PM Yu Watanabe <yu.w.ten...@gmail.com>
>> wrote:
>>
>>> Hello .
>>>
>>> I have a pipeline built on  apache beam 2.13.0 using python 3.7.3.
>>> My pipeline lasts about 5 hours to ingest 2 sets of approximately 70000
>>> Json objects using Direct Runner.
>>>
>>> I want to diagnose which transforms are taking time and  improve code
>>> for better performance. I saw below module for profiling but it seems it
>>> does not report about speed of each transform.
>>>
>>>
>>> https://beam.apache.org/releases/pydoc/2.13.0/apache_beam.utils.profiler.html
>>>
>>> Is there any module that you could use to monitor speed of each
>>> transform ? If not, I appreciate if I could get some help for how to
>>> monitor speed for each transform.
>>>
>>> Best Regards,
>>> Yu Watanabe
>>>
>>> --
>>> Yu Watanabe
>>> Weekend Freelancer who loves to challenge building data platform
>>> yu.w.ten...@gmail.com
>>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>>> Twitter icon] <https://twitter.com/yuwtennis>
>>>
>>

-- 
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.ten...@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
Twitter icon] <https://twitter.com/yuwtennis>

Reply via email to