Flink redshift table lookup and updates

2016-08-18 Thread Harshith Chennamaneni
Hi, I've very recently come upon flink and I'm trying to use it to solve a problem that I have. I have a stream of User Settings updates coming through kafka queue. I need to store the most recent settings along with a history of settings for each user in redshift which then feeds into analytics

Re: checkpoint state keeps on increasing

2016-08-18 Thread Janardhan Reddy
I also thought that the checkpointing state size was growing due to growing key space, i registered processing time timer on 'onevent' and wrote a custom trigger and still the checkpointing state size was growing. Our code is linked with flink 1.0.0 jar but was running on flink 1.1.1 (yarn session

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-18 Thread Miroslav Gajdoš
Tried to build it from source as well as use prebuilt binary release (v1.1.1), the last one produced this log output: http://pastebin.com/3L5Yhs9x Application in yarn still fails on "Fatal error in AM: The ContainerLaunchContext was not set". Mira Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +

Checking for existance of output directory/files before running a batch job

2016-08-18 Thread Niels Basjes
Hi, I have a batch job that I run on yarn that creates files in HDFS. I want to avoid running this job at all if the output already exists. So in my code (before submitting the job into yarn-session) I do this: String directory = "foo"; Path directory = new Path(directoryName);FileS

Batch jobs with a very large number of input splits

2016-08-18 Thread Niels Basjes
Hi, I'm working on a batch process using Flink and I ran into an interesting problem. The number of input splits in my job is really really large. I currently have a HBase input (with more than 1000 regions) and in the past I have worked with MapReduce jobs doing 2000+ files. The problem I have

Re: counting elements in datastream

2016-08-18 Thread Sameer W
Use Count windows and keep emitting results say every 1000 elements and do a sum. Or do without windows something like this which has the disadvantage that it emits a new updated result for each new element (not a good thing if your volume is high)- https://github.com/sameeraxiomine/flinkinaction/

counting elements in datastream

2016-08-18 Thread subash basnet
Hello all, If anyone had idea, what could be the probable way to count the elements of a current instance of the datastream. Is it possible? DataStream> pointsWithGridCoordinates; Regards, Subash Basnet

Re: checkpoint state keeps on increasing

2016-08-18 Thread Stephan Ewen
Hi! Count windows are a little tricky. If you have a growing key space, the state keeps growing. Lets say you have a bunch of values for key "A". You will fire the count windows for every two elements and keep one element. If "A" never comes again after that, the one element will still be kept ar

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-18 Thread Miroslav Gajdoš
Hi Max, we are building it from sources and package it for debian. I can try to use the binary release for hadoop 2.6.0. Regarding zookeeper, we do not share instances between dev and production. Thanks, Miroslav Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200: > Hi Miroslav, > > From

Re: off heap memory deallocation

2016-08-18 Thread Maximilian Michels
Hi, Off-heap memory currently only gets deallocated once MaxDirectMemory has been reached. We can't manually clear the memory because some of the code assumes that it can still access old memory after it has been released. In case of offheap memory, that would give us a segmentation fault. We cur

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-18 Thread Maximilian Michels
Hi Miroslav, >From the logs it looks like you're using Flink version 1.0.x. The ContainerLaunchContext is always set by Flink. I'm wondering why this error can still occur. Are you using the default Hadoop version that comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of Flink. Does