Hi,

I have a S3 bucket that is continuously written to by millions of devices. 
These upload small compressed archives.

What I want to do is treat the tar gzipped (.tgz) files as a streaming source 
and process each archive. The archive contains three files that each might need 
to be processed.

I see that

env.readFile(f

,

bucket

,

FileProcessingMode.

PROCESS_CONTINUOUSLY

,

10000L

).print()

;

might do what I need, but I am unsure how best to implement 'f' - the 
InputFileFormat. Is there a similar example for me to reference?

Or is this idea not workable with this method? I need to ensure exactly once, 
and also trigger removal of the files after processing.

Thanks,

Sent with [ProtonMail](https://protonmail.com) Secure Email.

Reply via email to