Hello,

I am a developer trying to use Apache Beam, and I am running into an issue
where my WaitOn step is not working as expected. I want my pipeline to read
all the data from an S3 bucket using ParquetIO before moving on to the rest
of the steps in my pipeline. However, I see in my DAG that even though
there is a collect step after all the data is being read in, my pipeline
still reads from S3 in the subsequent steps. It appears that the Wait.on is
not actually happening. Is it even possible to wait on a read step? This is
what my code looks like:

PCollection<GenericRecord> records = pipeline.apply("Read parquet file
in as Generic Records",
ParquetIO.read(finalSchema).from(beamReadPath).withConfiguration(configuration));
PCollection<GenericRecord> recordsWaited = records
        .apply("Waiting on Read Parquet File",
Wait.on(records)).setCoder(AvroCoder.of(GenericRecord.class,
finalSchema));
{Processing of rest of data subsequently}



Any help would be greatly appreciated, thanks!

Sincerely,
Ramya

______________________________________________________________________



The information contained in this e-mail may be confidential and/or proprietary 
to Capital One and/or its affiliates and may only be used solely in performance 
of work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.



Reply via email to