Re: Stream job failed after increasing number retained checkpoints

2018-01-09 Thread Piotr Nowojski
Hi, Increasing akka’s timeouts is rarely a solution for any problems - it either do not help, or just mask the issue making it less visible. But yes, it is possible to bump the limits: https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#distributed-coordination-via-akk

is it possible to convert "retract" datastream to table

2018-01-09 Thread Yan Zhou [FDS Science]
Hi, There are APIs to convert a dynamic table to retract stream. Is there any way to construct a "retract" data stream and convert it into table? I want to read the change log of relational database from kafka, "apply" the changes within flink( by creating CRow DataStream), register/create a t

Re: Gelly: akka.ask.timeout

2018-01-09 Thread Greg Hogan
Hi Alieh, Are you able to run the example WordCount application without losing TaskManagers? Greg > On Jan 8, 2018, at 7:48 AM, Alieh Saeedi wrote: > > Hey all, > I have an iterative algorithm implemented in Gelly. As long as I upgraded > everything to flink-1.3.1 from 1.1.2, the runtime ha

hadoop-free hdfs config

2018-01-09 Thread Oleksandr Baliev
Hello guys, want to clarify for myself: since flink 1.4.0 allows to use hadoop-free distribution and dynamic hadoop dependencies loading, I suppose that if to download hadoop-free distribution, start cluster without any hadoop and then load any job's jar which has some hadoop dependencies (i used

Custom Partitioning for Keyed Streams

2018-01-09 Thread Martin, Nick
Have a set of stateful operators that rely on keyed state. There is substantial skew between keys (i.e. there will be 100 messages on keys A and B, and 10 messages each on keys C-J), and key selection assignment is dictated by the needs of my application such that I can't choose keys in a way th

Re: Testing Flink 1.4 unable to write to s3 locally

2018-01-09 Thread Kyle Hamlin
Which dependencies do I need to put in the lib directory? I have the flink-shaded-hadoop2.jar and parquet-hadoop.jar marked as "provided" and both are in the lib/ directory and I still get the error? I've also set classloader.resolve-order: parent-first in my flink-conf.yaml. val flinkDependencies

Re: Testing Flink 1.4 unable to write to s3 locally

2018-01-09 Thread Stephan Ewen
Hi! For the exception in the "local file path" case, we already had a reason: - You were having Hadoop code in your user code jar. Either not putting the Hadoop dependency into your jar (rather in Flink's lib directory) or setting classloading to "parent-first" should do the trick there. For t

Re: Flink 1.4.0 Mapr libs issues

2018-01-09 Thread ani.desh1512
Hi Fabian, Thanks a lot for the reply. Setting the classloader.resolve-order configuration seems to have done the trick. For anybody else, having the same problem as this, this is the config that I set: /*classloader.resolve-order: parent-first classloader.parent-first-patterns: java.;org.apache.f

What's the meaning of "Registered `TaskManager` at akka://flink/deadLetters " ?

2018-01-09 Thread Reza Samee
I'm running a flink-cluster (a mini one with just one node); but the problem is that my TaskManager can't reach to my JobManager! Here are logs from TaskManager ... Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager (attempt 20, timeout: 30 seconds) Trying to register at

Re: event ordering

2018-01-09 Thread Fabian Hueske
Hi Christophe, Flink exposes event-time and watermark not only in windows. The easiest solution would be to use a ProcessFunction [1] which can access the timestamp of a record. I would apply a ProcessFunction on a keyed stream (keyBy(id) [2]) and store the max timestamp per key in a ValueState [

Re: Flink 1.4.0 Mapr libs issues

2018-01-09 Thread Fabian Hueske
Hi Aniket, you could try to restore the previous behavior by configuring classloader.resolve-order: parent-first in the Flink configuration. Best, Fabian 2018-01-08 23:09 GMT+01:00 ani.desh1512 : > *Background:* We have a setup of Flink 1.3.2 along with a secure MAPR > (v5.2.1) cluster (Flink

Re: akka.remote.ShutDownAssociation: Shut down address: akka.tcp://flink@fps-flink-jobmanager:45652

2018-01-09 Thread Fabian Hueske
Hi, Till (in CC) might be able to help with Akka related questions. Best, Fabian 2018-01-08 6:46 GMT+01:00 Hao Sun : > I am running Flink 1.3.2 in my local docker environment. > > I see this error, not sure how to find the root cause. > I am confused by this error message, why JM is trying to c

event ordering

2018-01-09 Thread Christophe Jolif
Hi everyone, Let's imagine I have a stream of events coming a bit like this: { id: "1", value: 1, timestamp: 1 } { id: "2", value: 2, timestamp: 1 } { id: "1", value: 4, timestamp: 3 } { id: "1", value: 5, timestamp: 2 } { id: "2", value: 5, timestamp: 3 } ... As you can see with the non monoto

Re: Dynamically get schema from element to pass to AvroParquetWriter

2018-01-09 Thread Fabian Hueske
Hi Kyle, I'm not sure I understand the problem. I assume you have one sink for each Avro type (Kafka topic). If you have multiple sinks, why is it not possible to configure each one with the correct Avro schema? Best, Fabian 2018-01-05 22:11 GMT+01:00 Kyle Hamlin : > I implemented an Avro to Pa

Re: Queryable State in Flink 1.4

2018-01-09 Thread Fabian Hueske
Thanks Boris. I've filed FLINK-8391 [1] to extend the documentation. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-8391 2018-01-05 19:52 GMT+01:00 Boris Lublinsky : > Thanks > This was it. > It would help to have this in documentation along with > `flink-queryable-state-client` >

Two issues when deploying Flink on DC/OS

2018-01-09 Thread Dongwon Kim
Hi, I've launched JobManager and TaskManager on DC/OS successfully. Now I have two new issues: 1) All TaskManagers are scheduled on a single node. - Is it intended to maximize data locality and minimize network communication cost? - Is there an option in Flink to adjust the behavior of JobManag

Re: Queryable State - Count within Time Window

2018-01-09 Thread Fabian Hueske
Hi, you can implement that with a ProcessFunction [1]. The ProcessFunction would have to compute counts at some granularity (either event-time or processing-time) of records that are processed by the ProcessFunction in processElement(). I would do this with a MapState that has a timestamp as key