Re: Submit Flink Jobs to YARN running on AWS

2016-04-29 Thread Bajaj, Abhinav
Hi Robert, Thanks for your reply. I am using the Public DNS for the EC2 machines in the yarn and hdfs configuration files. It looks like "ec2-203-0-113-25.compute-1.amazonaws.com” You should be able to connect then. I have hadoop installed locally and the YARN_CONF_DIR is pointing to it. The

Flink on Azure HDInsight

2016-04-29 Thread Brig Lamoreaux
Hi All, Are there any issues with Flink on Azure HDInsight? Thanks, Brig Lamoreaux Data Solution Architect US Desert/Mountain Tempe [MSFT_logo_Gray DE sized SIG1.png]

Re: join performance

2016-04-29 Thread Henry Cai
So is the window defined as hour-window or second-window? If I am using hour-window, I guess I need to modify the trigger to fire early (e.g. every minute)? But I don't want to repeatedly emit the same joined records for every minute (i.e. on 2nd minute, I only want to emit the changes

Creating a custom operator

2016-04-29 Thread Simone Robutti
Hello, I'm trying to create a custom operator to explore the internals of Flink. Actually the one I'm working on is rather similar to Union and I'm trying to mimick it for now. When I run my job though, this error arise: Exception in thread "main" java.lang.IllegalArgumentException: Unknown

Re: Checking actual config values used by TaskManager

2016-04-29 Thread Ken Krugler
Hi Timur, > On Apr 28, 2016, at 10:40pm, Timur Fayruzov wrote: > > If you're talking about parameters that were set on JVM startup then `ps > aux|grep flink` on an EMR slave node should do the trick, that'll give you > the full command line. No, I’m talking about

Re: Anyone going to ApacheCon Big Data in Vancouver?

2016-04-29 Thread Trevor Grant
Hey Ken, I'll be there doing a talk on Monday afternoon on using Zeppelin for a Data Science environment with a couple of Flink and Spark Examples. I'm doing a tutorial Wednesday morning (I think for ApacheCon) that is about setting up Zeppelin with Flink and Spark in cluster mode. Would love

Re: Requesting the next InputSplit failed

2016-04-29 Thread Stefano Bortoli
We could successfully run the job without issues. Thanks a lot everyone for the support. FYI: with Flink we completed in 3h28m the job that was planned to run for 15 days 24/7 relying on our legacy customer approach. :-) saluti, Stefano 2016-04-28 14:50 GMT+02:00 Fabian Hueske

Re: Count on grouped keys

2016-04-29 Thread Punit Naik
Yeah no problem. Its not an optimised solution but I think it gives enough understanding of how reduceGroup works. On 29-Apr-2016 5:17 PM, "Stefano Baghino" wrote: > Thanks for sharing the solution, Punit. > > On Fri, Apr 29, 2016 at 1:40 PM, Punit Naik

Re: Count on grouped keys

2016-04-29 Thread Stefano Baghino
Thanks for sharing the solution, Punit. On Fri, Apr 29, 2016 at 1:40 PM, Punit Naik wrote: > Anyways, I managed to do it. you should attach the following code block to > your groupBy > .reduceGroup { > (in, out: org.apache.flink.util.Collector[(Map[String,String],

Re: join performance

2016-04-29 Thread Aljoscha Krettek
Hi, you are right, everything will be emitted in a huge burst at the end of the hour. If you want to experiment a bit you can write a custom Trigger based on EventTimeTrigger that will delay firing of windows. You would change onEventTime() to not fire but instead register a processing-time timer

Re: Discarding header from CSV file

2016-04-29 Thread nsengupta
Hello Chiwan, Sorry for the late reply. I have been into other things for some time. Yes, you are right. I have been assuming that field to be Integer, wrongly. I will fix it and give it a go again. Many thanks again. -- Nirmalya -- View this message in context:

Count on grouped keys

2016-04-29 Thread Punit Naik
I have a dataset which has maps. I have performed a groupBy on a key and I want to count all the elements in a particular group. How do I do this? -- Thank You Regards Punit Naik

Fwd: TypeVariable problems

2016-04-29 Thread Martin Neumann
Hej, I have a construct of different generic classes stacked on each other to create a library (so the type variables get handed on). And I have some trouble getting it to work. The current offender is a Class with 3 type variables internally it calls: .fold(new Tuple3<>(keyInit ,new

Re: join performance

2016-04-29 Thread Henry Cai
But the join requirement is to match the records from two streams occurring within one hour (besides the normal join key condition), if I use the second join window, those records wouldn't be in the same window any more. On Thu, Apr 28, 2016 at 11:47 PM, Ashutosh Kumar

Re: join performance

2016-04-29 Thread Ashutosh Kumar
Time unit can be in seconds as well. Is there specific need to get bursts hourly? On Fri, Apr 29, 2016 at 11:48 AM, Henry Cai wrote: > For the below standard stream/stream join, does flink store the results of > stream 1 and stream 2 into state store for the current hour and

join performance

2016-04-29 Thread Henry Cai
For the below standard stream/stream join, does flink store the results of stream 1 and stream 2 into state store for the current hour and at the end of the hour window it will fire the window by iterating through all stored elements in the state store to find join matches? My concern is during