Word of caution. Streaming from S3 is really cost prohibitive as the only
way to detect new files is to continuously spam the S3 List API.

On Tue, Sep 1, 2020 at 4:50 PM Jörn Franke <jornfra...@gmail.com> wrote:

> Why don’t you get an S3 notification on SQS and do the actions from there?
>
> You will probably need to write the content of the files to a no sql
> database .
>
> Alternatively send the s3 notification to Kafka and read flink from there.
>
>
> https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>
>
> Am 01.09.2020 um 16:46 schrieb orionemail <orionem...@protonmail.com>:
>
> 
> Hi,
>
> I have a S3 bucket that is continuously written to by millions of
> devices.  These upload small compressed archives.
>
> What I want to do is treat the tar gzipped (.tgz) files as a streaming
> source and process each archive.  The archive contains three files that
> each might need to be processed.
>
> I see that
>
> env.readFile(f, bucket, FileProcessingMode.*PROCESS_CONTINUOUSLY*, 
> 10000L).print();
>
> might do what I need, but I am unsure how best to implement 'f' - the
> InputFileFormat.  Is there a similar example for me to reference?
>
> Or is this idea not workable with this method? I need to ensure exactly
> once, and also trigger removal of the files after processing.
>
> Thanks,
>
>
> Sent with ProtonMail <https://protonmail.com> Secure Email.
>
>

Reply via email to