Watermark not firing to push data

2018-12-14 Thread Vijay Balakrishnan
Hi, Observations on Watermarks: Read this great article: https://data-artisans.com/blog/watermarks-in-apache-flink-made-easy * Watermark means when for any event TS, when to stop waiting for arrival of earlier events. * Watermark t means all events with Timestamp < t have already arrived. * When t

Re: Encountered the following Expired Iterator exception in getRecords using FlinkKinesisConsumer

2018-12-14 Thread Vijay Balakrishnan
I have 2 shards in the Kinesis Streams- need to figure out how to check from the logs if records are being written to both shards . Not sure if this is what you are looking for in terms of # of shards read- seems like 1 from the logs below: DEBUG org.apache.http.wire

Re: StreamingFileSink causing AmazonS3Exception

2018-12-14 Thread Kostas Kloudas
Hi Steffen, Thanks for reporting this. Internally Flink does not keep any open connections to S3. It only keeps buffers data internally up till the point they reach a min-size limit (by default 5MB) and then uploads them as a part of an MPU on one go. Given this, I will have to dig a bit dipper

TimerService Troubleshooting/Metrics

2018-12-14 Thread sayat
Dear Flink Community, Is there a way of troubleshooting timer service? In the docs, it says that the service might degrade a job performance significantly. Is there a way how to expose and see timer service metrics i.e. length of the priority queue, how many time the service fires etc?

Re: SANSA 0.5 (Scalable Semantic Analytics Stack) Released

2018-12-14 Thread Timo Walther
Hi, looks like a very useful extension to Flink. Thanks for letting us know! You can also use the commun...@flink.apache.org mailing list to spread the news because the user@ list is more for user support questions and help. Regards, Timo Am 14.12.18 um 09:23 schrieb GezimSejdiu: Dear all,

Re: Generating processing time watermarks in idle event time kafka streams?

2018-12-14 Thread William Saar
Thanks, works great! This should be very useful for real-time dashboard that want to compute in event time, especially for multi-tenant systems or other specialized kafka topics that can have gaps in the traffic. - Original Message - From: "Aljoscha Krettek" To:"William Saar" Cc: Sent:Fr

Re: Connection leak with flink elastic Sink

2018-12-14 Thread Vijay Bhaskar
Sure, let me try out with more debug logs and get back to you Regards Bhaskar On Fri, Dec 14, 2018 at 4:41 PM Tzu-Li (Gordon) Tai wrote: > Hi, > > (Removed dev@ from the mail thread) > > I took a look at the logs you provided, and it seems like the sink > operators should have been properly tea

Re: Connection leak with flink elastic Sink

2018-12-14 Thread Tzu-Li (Gordon) Tai
Hi, (Removed dev@ from the mail thread) I took a look at the logs you provided, and it seems like the sink operators should have been properly tear-down, and therefore closing the RestHighLevelClient used internally. I’m at this point not really sure what else could have caused this besides a

Re: problem submitting job, it hangs there

2018-12-14 Thread Tzu-Li Chen
Hi Chang, I think there is a JIRA[1] aimed at harden this case. In fact Flink create this directory on started and without other warnings, we can assume that it has been created. So it might be deleted by some clean up processes(by Flink or by the fs). Best, tison. [1] https://issues.apache.org

Re: Generating processing time watermarks in idle event time kafka streams?

2018-12-14 Thread Aljoscha Krettek
Hi William, there is currently no official way of doing this but the Flink community will be working on this as part of an upcoming source refactoring. For now, there is this watermark extractor that I once did: https://github.com/aljoscha/flink/commit/6e4419e550caa0e5b162bc0d2ccc43f6b0b3860f

Re: problem submitting job, it hangs there

2018-12-14 Thread Chang Liu
My question is: whatever the Flink user is doing, as long as he/her is doing all the actions within the Flink-provided ways (Flink CLI or Flink APIs in code), should not be able to touch this directory, right? Because this directory is for the JobManager and managed by Flink. Best regards/祝好,

Re: problem submitting job, it hangs there

2018-12-14 Thread Chang Liu
Hi Chesnay, What do you mean by "...we can make a small adjustment to the code…"? Do you mean I, as a flink application developer, can do this in my code, OR, it has to be a code change in the Flink itself? And more importantly, I would like to ping point the root cause of this because I canno

Generating processing time watermarks in idle event time kafka streams?

2018-12-14 Thread William Saar
Any standardized components to generate watermarks based on processing time in an event time stream when there is no data from a source? The docs for event time [1] indicate that people are doing this, but the only suggestion on Stack Overflow [2] is to make every window operator in stream have a

Re: Encountered the following Expired Iterator exception in getRecords using FlinkKinesisConsumer

2018-12-14 Thread Tzu-Li (Gordon) Tai
Hi, I’m suspecting that this is the issue:  https://issues.apache.org/jira/browse/FLINK-11164. One more thing to clarify to be sure of this: Do you have multiple shards in the Kinesis stream, and if yes, are some of them actually empty? Meaning that, even though you mentioned some records were w

SANSA 0.5 (Scalable Semantic Analytics Stack) Released

2018-12-14 Thread GezimSejdiu
Dear all, The Smart Data Analytics group (http://sda.tech) is happy to announce SANSA 0.5 - the fifth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Apache Flink in order to allow scalable machine learning, inference and querying capabili