Hi all,

I wrote a Beam Pipeline written with the python SDK that resample a
timeseries containing data points everery minute to a 5-minutes timeserie.

My pipeline looks like:
input_data |
WindowInto(FixedWindows(size=timedelta(minutes=5).total_seconds())) |
CombineGlobaly(resample_function)

When I run it with the local or DataFlow runner with a small dataset, it
works and does what I want.

But when I try to run it on the DataFlow runner with a bigger dataset (1
700 000 datapoints timestamped over 15 years) it stay stuck for hours on
the GroupByKey step of CombineGlobaly.

My question is : Did I do something wrong with the design of my pipeline?

PS: Can someone invite me to the slack channel?
-- 

Tristan Marechaux

Data Scientist | *Walnut Algorithms*

Mobile : +33 627804399 <+33627804399>

Email: [email protected]

Web: www.walnutalgorithms.com

Reply via email to