Assistance configuring access to GoogleCloudStorage for row format streaming file sink

2020-11-13 Thread orionemail
Hi, I am running flink 1.10.1 initially on my local development machine - Macbook Pro. I'm struggling to understand how to write to Google Cloud storage using the StreamingfileSink (S3 works fine). There error I am seeing: "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could n

Re: Using S3 as a streaming File source

2020-09-02 Thread orionemail
t; You will probably need to write the content of the files to a no sql >> database . >> >> Alternatively send the s3 notification to Kafka and read flink from there. >> >> https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html >> >>&g

Using S3 as a streaming File source

2020-09-01 Thread orionemail
Hi, I have a S3 bucket that is continuously written to by millions of devices. These upload small compressed archives. What I want to do is treat the tar gzipped (.tgz) files as a streaming source and process each archive. The archive contains three files that each might need to be processed.

Re: Please help, I need to bootstrap keyed state into a stream

2020-08-10 Thread orionemail
I recently was in the same situation as Marco, the docs do explain what you need to do, but without experience with Flink it might still not be obvious what you need to do. What I did initially: Setup the job to run in a 'write a save state' mode by implementing a command line switch I could u

Re: Is there a way to access the number of 'keys' that have been used for partioning?

2020-07-16 Thread orionemail
w many keys there are at any point. > You'd have to calculate it yourself. > > On 15/07/2020 17:11, orionemail wrote: > >> Hi, >> >> I need to query the number of keys that a stream has been split by, is there >> a way to do this? >> >> Thanks, >> >> O

Is there a way to access the number of 'keys' that have been used for partioning?

2020-07-15 Thread orionemail
Hi, I need to query the number of keys that a stream has been split by, is there a way to do this? Thanks, O

Internal state and external stores conditional usage advice sought (dynamodb asyncIO)

2020-06-08 Thread orionemail
Hi, Following on from an earlier email my approach has changed but I am still unsure how to best acheive my goal. I have records coming through a kinesis stream into flink: { id: var1: ... } 'id' needs to be replaced with a value from a DB store, or if not present in the DB generate in

Re: Suggestions for using both broadcast sync and conditional async-io

2020-06-04 Thread orionemail
es DynamoDB and all lookups > are local. > > Cheers, > Gordon > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html#broadcast-state-1 > > On Wed, Jun 3, 2020 at 10:15 PM orionemail wrote: > >> Hi, >> >> My

Suggestions for using both broadcast sync and conditional async-io

2020-06-03 Thread orionemail
Hi, My current application makes use of a DynamoDB database too map a key to a value. As each record enters the system the async-io calls this db and requests a value for the key but if that value doesn't exist a new value is generated and inserted. I have managed to do all this in one update

Re: How To subscribe a Kinesis Stream using enhance fanout?

2020-05-15 Thread orionemail
Hi, We also recently needed this functionality, unfortunately we were unable to implement it ourselves so changed our plan accordingly. However we very much see the benefit for this feature and would be interested in following the JIRA ticket. Thanks ‐‐‐ Original Message ‐‐‐ On Thursd

StreamingFileSink to a S3 Bucket on a remote account using STS

2020-04-20 Thread orionemail
Hi, New to both AWS and Flink but currently have a need to write incoming data into a S3 bucket managed via AWS Tempory credentials. I am unable to get this to work, but I am not entirely sure on the steps needed. I can write to S3 buckets that are not 'remote' and managed by STS tempory cred