Hi all, I wrote a Beam Pipeline written with the python SDK that resample a timeseries containing data points everery minute to a 5-minutes timeserie.
My pipeline looks like: input_data | WindowInto(FixedWindows(size=timedelta(minutes=5).total_seconds())) | CombineGlobaly(resample_function) When I run it with the local or DataFlow runner with a small dataset, it works and does what I want. But when I try to run it on the DataFlow runner with a bigger dataset (1 700 000 datapoints timestamped over 15 years) it stay stuck for hours on the GroupByKey step of CombineGlobaly. My question is : Did I do something wrong with the design of my pipeline? PS: Can someone invite me to the slack channel? -- Tristan Marechaux Data Scientist | *Walnut Algorithms* Mobile : +33 627804399 <+33627804399> Email: [email protected] Web: www.walnutalgorithms.com
