Re: Migration to Flip6 Kubernetes

2018-05-02 Thread Derek VerLee
Is anyone actively working on direct Kubernetes support? I'd be excited to see this get in sooner rather than later, I'd be happy to start a PR. On 3/22/18 10:37 AM, Till Rohrmann wrote: Hi Edward and Eron, you're

Re: Insert data into Cassandra without Flink Cassandra connection

2018-05-02 Thread Shuyi Chen
Maybe you can share a bit more about why you need only one connection to Cassandra across all TaskManagers, so we can better help? On Wed, May 2, 2018 at 4:08 AM, Piotr Nowojski wrote: > Hi, > > The only way that I can think of is if you keep your flatMap operator with

Re: KafkaProducer with generic (Avro) serialization schema

2018-05-02 Thread Wouter Zorgdrager
Hi, Thanks for the suggestions. Unfortunately I cannot make FromRecord/ForRecord/SchemaFor serializable, since those classes are out of my control. I use those from the avro4s library (https://github.com/sksamuel/avro4s). The problem here, especially with the deserializer is that I need to

Re: Cannot submit jobs on a HA Standalone JobManager

2018-05-02 Thread Julio Biason
Hey guys and gals, So, after a bit more digging, I found out that once HA is enabled, `jobmanager.rpc.port` is also ignore (along with `jobmanager.rpc.address`, but I was expecting this). Because I set the `high-availability.jobmanager.port` to `50010-50015`, my RPC port also changed (the docs

Re: KafkaProducer with generic (Avro) serialization schema

2018-05-02 Thread Fabian Hueske
Hi Wouter, you can try to make the SerializationSchema serializable by overriding Java's serialization methods writeObject() and readObject() similar as Flink's AvroRowSerializationSchema [1] does. Best, Fabian [1]

Re: KafkaProducer with generic (Avro) serialization schema

2018-05-02 Thread Aljoscha Krettek
Hi, Piotr is right, the SerializationSchema has to be serializable, which means that the implicit values passed on for SchemaFor[IN], FromRecord[IN], and ToRecord[IN] need to be serializable. Is there no way of making those serializable? As a workaround you could think about having a factory

Re: Setting the parallelism in a cluster of machines properly

2018-05-02 Thread m@xi
Hey Fabian! Sorry for being unaware regarding Flink configurations, but for me I have followed every step but still setting a simple cluster of 2 nodes proved to be a pain in the as@@#. So, to which value you think I should set the akka timeout? Also, in my head the process is the following :

Re: KafkaProducer with generic (Avro) serialization schema

2018-05-02 Thread Piotr Nowojski
Hi, My Scala knowledge is very limited (and my Scala's serialization knowledge is non existent), but one way or another you have to make your SerializationSchema serialisable. If indeed this is the problem, maybe a better place to ask this question is on Stack Overflow or some scala specific

intentional back-pressure (or a poor man's side-input)

2018-05-02 Thread Derek VerLee
I was just thinking about about letting a coprocessfunction "block" or cause back pressure on one of it's streams? Has this been discussed as an option? Does anyone know a way to effectively accomplish this? I think I could get a lot of mileage out of something

Re: ConnectedIterativeStreams and processing state 1.4.2

2018-05-02 Thread Lasse Nedergaard
Hi. Because the data that I will cache come from a downstream operator and iterations was the only way to look data back to a prev. Operator as I know Med venlig hilsen / Best regards Lasse Nedergaard > Den 2. maj 2018 kl. 15.35 skrev Piotr Nowojski : > > Hi, > >

Re: ConnectedIterativeStreams and processing state 1.4.2

2018-05-02 Thread Piotr Nowojski
Hi, Why can not you use simple CoProcessFunction and handle cache updates within it’s processElement1 or processElement2 method? Piotrek > On 1 May 2018, at 10:20, Lasse Nedergaard wrote: > > Hi. > > I have a case where I have a input stream that I want to enrich

Re: Setting the parallelism in a cluster of machines properly

2018-05-02 Thread Fabian Hueske
It's not a requirement but the exception reads "org.apache.flink.runtime. client.JobClientActorConnectionTimeoutException: Lost connection to the JobManager.". So increasing the timeout might help. Best, Fabian 2018-05-02 12:20 GMT+02:00 m@xi : > Hello Fabian! > > Thanks

Re: Windows aligned with EDT ( with day light saving )

2018-05-02 Thread Vishal Santoshi
True that. Thanks. Wanted to be sure before I go down that path. On Wed, May 2, 2018 at 9:19 AM, Fabian Hueske wrote: > Hi Vishal, > > AFAIK it is not possible with Flink's default time windows. > However, it should be possible to implement a custom WindowAssigner for > your

Re: Windows aligned with EDT ( with day light saving )

2018-05-02 Thread Fabian Hueske
Hi Vishal, AFAIK it is not possible with Flink's default time windows. However, it should be possible to implement a custom WindowAssigner for your use case. I'd have a look at the TumblingEventTimeWindows class and copy/modify it to your needs. Best, Fabian 2018-05-02 15:12 GMT+02:00 Vishal

Re: Windows aligned with EDT ( with day light saving )

2018-05-02 Thread Vishal Santoshi
This does not seem possible but need some confirmation. Anyone ? On Tue, May 1, 2018 at 12:00 PM, Vishal Santoshi wrote: > How do I align a Window with EDT with day light saving correction ? The > offset takes a hardcoded value. I need 6 hour windows aligned to 00,

Cannot submit jobs on a HA Standalone JobManager

2018-05-02 Thread Julio Biason
Hello all, I'm building a standalone cluster with HA JobManager. So far, everything seems to work, but when i try to `flink run` my job, it fails with the following error: Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway. So

Re: Fat jar fails deployment (streaming job too large)

2018-05-02 Thread Piotr Nowojski
Short answer: could be that your job is simply too big to be serialised, distributed and deserialised in the given time and you would have to increase timeouts even more. Long answer: Do you have the same problem when you try to submit smaller job? Does your cluster work for simpler jobs?

Re: Apache Flink - Flink Forward SF 2018 - Scaling stream data pipelines (source code)

2018-05-02 Thread Piotr Nowojski
Hi, Till, do have this code somewhere? M Singh: Till is out of the office and will be back on next week, so he will probably not be able to respond for couple of days. Piotrek > On 30 Apr 2018, at 13:51, M Singh wrote: > > Hi: > > I was looking at the flink-forward sf

Re: Odd job failure

2018-05-02 Thread Piotr Nowojski
Hi, It might be some Kafka issue. From what you described your reasoning seems sound. For some reason TM3 fails and is unable to restart and process any data, thus forcing spilling on checkpoint barriers on TM1 and TM2. I don’t know the reason behind java.lang.NoClassDefFoundError:

Wiring batch and stream together

2018-05-02 Thread Peter Zende
Hi, We have a Flink streaming pipeline (1.4.2) which reads from Kafka, uses mapWithState with RocksDB and writes the updated states to Cassandra. We also would like to reprocess the ingested records from HDFS. For this we consider computing the latest state of the records over the whole dataset

Re: Insert data into Cassandra without Flink Cassandra connection

2018-05-02 Thread Piotr Nowojski
Hi, The only way that I can think of is if you keep your flatMap operator with parallelism 1, but that might defeat the purpose. Otherwise there is no way to open one single connection and share it across multiple TaskManagers (which can be running on different physical machines). Please

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-02 Thread Amit Jain
Thanks! Fabian I will try using the current release-1.5 branch and update this thread. -- Thanks, Amit On Wed, May 2, 2018 at 3:42 PM, Fabian Hueske wrote: > Hi Amit, > > We recently fixed a bug in the network stack that affected batch jobs > (FLINK-9144). > The fix was

Re: Setting the parallelism in a cluster of machines properly

2018-05-02 Thread m@xi
Hello Fabian! Thanks for the answer. No I did not. Is this a requirement? Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Setting the parallelism in a cluster of machines properly

2018-05-02 Thread Fabian Hueske
Hi, did you try to increase the Akka timeout [1]? Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html#distributed-coordination-via-akka 2018-04-29 19:44 GMT+02:00 m@xi : > Guys seriously I have done the process as described in the

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-02 Thread Fabian Hueske
Hi Amit, We recently fixed a bug in the network stack that affected batch jobs (FLINK-9144). The fix was added after your commit. Do you have a chance to build the current release-1.5 branch and check if the fix also resolves your problem? Otherwise it would be great if you could open a blocker

Re: Application logs missing from jobmanager log

2018-05-02 Thread Fabian Hueske
Hi Juho, I assume that these logs are generated from a different process, i.e., the client process and not the JM or TM process. Hence, they end up in a different log file and are not covered by the log collection of the UI. The reason is that this process might also be run on a machine outside

Re: coordinate watermarks between jobs?

2018-05-02 Thread Fabian Hueske
Hi Tao, The watermarks of operators that consume from two (or more) streams are always synced to the lowest watermark. This behavior guarantees that data won't be late (unless it was late when watermarks were assigned). However, the operator will most likely need to buffer more events from the