Hey,
When trying to read a file from S3 with a combine action, the pipeline
seems to be stuck. When replacing it with a GCP source it works fine.
Furthermore - if I comment out the Count.PerElement part it also works.
Anyone has an idea why that is?
lines = p | beam.io.ReadFromText('s3://...')
transformed = (
lines
| 'SplitLine' >>
(beam.ParDo(WordExtractingDoFn()).with_output_types(unicode))
| 'Count' >> beam.combiners.Count.PerElement()
| 'Format' >> beam.MapTuple(lambda w, c: f'{w}: {c}')
)
transformed | 'Write' >> beam.io.WriteToText('s3://...')
Thanks!
Nir