Re: Incremental checkpoint branch

2017-03-03 Thread Shaoxuan Wang
Vinshnu, You can find the latest design discussion for incremental checkpoint in http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Incremental-Checkpointing-in-Flink-td15931.html @Stefan Richter

Re: Incremental checkpoint branch

2017-03-03 Thread Chen Qin
Hi Vishnu, My best gussing is there are lots of customized "incremental checkpointing" done via patch around rocksdb statebackend and rocksdb checkpoints. http://rocksdb.org/blog/2015/11/10/use-checkpoints-for-efficient-snapshots.html Thanks, Chen On Fri, Mar 3, 2017 at 1:16 PM, Ted Yu

Incremental checkpoint branch

2017-03-03 Thread Vishnu Viswanath
Hi, Can someone point me to the branch where the ongoing work for incremental checkpoint is going on, I would like to try it out even if the work is not complete. I have a use case where the state size increase about ~1gb every 5 minutes. Thanks, Vishnu

RE: Cross operation on two huge datasets

2017-03-03 Thread Gwenhael Pasquiers
Maybe I won’t try to broadcast my dataset after all : I finally found again what made me implement it with my own cloning flatmap + partitioning : Quoted from https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html#broadcast-variables Note: As the content of broadcast

Any good ideas for online/offline detection of devices that send events?

2017-03-03 Thread Bruno Aranda
Hi all, We are trying to write an online/offline detector for devices that keep streaming data through Flink. We know how often roughly to expect events from those devices and we want to be able to detect when any of them stops (goes offline) or starts again (comes back online) sending events

RE: Cross operation on two huge datasets

2017-03-03 Thread Gwenhael Pasquiers
To answer Ankit, It is a batch application. Yes, I admit I did broadcasting by hand. I did it that way because the only other way I found to “broadcast” a DataSet was to use “withBroadcast”, and I was afraid that “withBroadcast” would make flink load the whole dataset in memory before

RE: Cross operation on two huge datasets

2017-03-03 Thread Gwenhael Pasquiers
I tried putting my structure in a dataset but when serializing kryo went in an infinite recursive loop (crashed in StackOverflowException). So I’m staying with the static reference. As for the partitioning, there is always the case of shapes overlapping on both right and left sections, I

Memory Limits: MiniCluster vs. Local Mode

2017-03-03 Thread dominik
Hey, for our CI/CD cycle I'd like to try out our Flink Jobs in an development environment without running them against a huge EMR cluster (which is what we do for production), so something like a standalone mode. Until now, for this standalone running, I just started the job jar. As the

OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

2017-03-03 Thread Yassine MARZOUGUI
Hi all, I tried starting a local Flink 1.2.0 cluster using start-local.sh, with the following settings for the taskmanager memory: taskmanager.heap.mb: 16384 taskmanager.memory.off-heap: true taskmanager.memory.preallocate: true That throws and OOM error: Caused by: java.lang.Exception:

Re: Connecting workflows in batch

2017-03-03 Thread Aljoscha Krettek
Yes, right now that call never returns for a long-running streaming job. We will (in the future) provide a way for that call to return so that the result can be used for checking aggregators and other things. On Thu, Mar 2, 2017, at 19:14, Mohit Anchlia wrote: > Does it mean that for

Re: A Link Sink that writes to OAuth2.0 protected resource

2017-03-03 Thread Stefan Richter
Hi Hussein, for your information, today we added a description of task and operator lifecycles to the documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/task_lifecycle.html

Re: A Link Sink that writes to OAuth2.0 protected resource

2017-03-03 Thread Tzu-Li (Gordon) Tai
Hi Hussein! Your approach seems reasonable to me. The open() method will be called only once for the UDF every time the job has started (and when the job is restored from failures also). Cheers, Gordon On March 3, 2017 at 7:03:22 PM, Hussein Baghdadi (hussein.baghd...@zalando.de) wrote:

Re: Flink Error/Exception Handling

2017-03-03 Thread Tzu-Li (Gordon) Tai
Hi Sunil, There’s recently some effort in allowing `DeserializationSchema#deserialize()` to return `null` in cases like yours, so that the invalid record can be simply skipped instead of throwing an exception from the deserialization schema. Here are the related links that you may be interested

Re: Serialization performance

2017-03-03 Thread Aljoscha Krettek
Hi Billy, on the Beam side, you probably have looked into writing your own Coder (the equivalent of a TypeSerializer in Flink). If yes, did that not work out for you? And if yes, why? Best, Aljoscha On Thu, Mar 2, 2017, at 22:02, Stephan Ewen wrote: > Hi! > > I can write some more

A Link Sink that writes to OAuth2.0 protected resource

2017-03-03 Thread Hussein Baghdadi
Hello, In our Sink, we are dealing with a system that uses OAuth 2.0. So the in the open() method of the Sink we are getting the token and then we initialise the client that we can use in order to write from Flink to that API. Is there a better approach to handle that? open() is a lifecycle

RE: Cross operation on two huge datasets

2017-03-03 Thread Gwenhael Pasquiers
I managed to avoid the classes reload by controlling the order of operations using “.withBroadcast”. My first task (shapes parsing) now outputs an empty “DataSet synchro” Then whenever I need to wait for that synchro dataset to be ready (and mainly the operations prior to that dataset to be