Re: Submit Flink Jobs to YARN running on AWS

2016-03-09 Thread Bajaj, Abhinav
Thanks for the quick reply. Let me describe in more detail here. I am trying to submit a single Flink Job to YARN using the client - ./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 ./examples/batch/WordCount.jar In my understanding, YARN allocates a container for the Jobmanager. Jobma

Re: Checkpoint

2016-03-09 Thread Vijay Srinivasaraghavan
Hi Ufuk, I have increased the sampling size to 1000 and decreased the refresh interval by half. In my Kafka topic I have pumped million messages which is read by KafkaConsumer pipeline and then pass it to a transofmation step where I have introduced sleep (3 sec) for every single message receive

Re: asm IllegalArgumentException with 1.0.0

2016-03-09 Thread Zach Cox
I also noticed when I try to run this application in a local environment, I get the same IllegalArgumentException. When I assemble this application into a fat jar and run it on a Flink cluster using the CLI tools, it seems to run fine. Maybe my local classpath is missing something that is provide

asm IllegalArgumentException with 1.0.0

2016-03-09 Thread Zach Cox
Hi - after upgrading to 1.0.0, I'm getting this exception now in a unit test: IllegalArgumentException: (null:-1) org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown Source) org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown Source) org.apache.flink.api.scala.InnerClo

ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-09 Thread Zach Cox
Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no longer has an enableTimestamps() method. Do we just not need to call that at all now? The docs still say to call it [1] - do they just need to be updated? Thanks, Zach [1] https://ci.apache.org/projects/flink/flink-docs-release-1

Re: protobuf messages from Kafka to elasticsearch using flink

2016-03-09 Thread Madhukar Thota
Hi Fabian We are already using Flink to read json messages from kafka and index into elasticsearch. Now we have a requirement to read protobuf messages from kafka. I am new to protobuf and looking for help on how to deserialize protobuf using flink from kafka consumer. -Madhu On Wed, Mar 9, 2016

Re: streaming job reading from kafka stuck while cancelling

2016-03-09 Thread Stephan Ewen
The reason that the consumer thread is not interrupted (which is the reason why there is a separate consumer thread in the first place) is that Kafka has a bug (or design issue) where thread interrupting may lead to a deadlock in the thread. Interrupting the thread would need to make sure that int

Re: operators

2016-03-09 Thread Stephan Ewen
Hi! You cannot specify that on the higher API levels. The lower API levels have something called "CoLocationConstraint". At this point it is not exposed, because we thought that would lead to not very scalable and robust designs in many cases . The best thing usually is location transparency and l

operators

2016-03-09 Thread Radu Tudoran
Hi, Is there any way in which you can ensure that 2 distinct operators will be executed on the same machine? More precisely what I am trying to do is to have a window that computes some metrics and will dump this locally (from the operator not from an output sink) and I would like to create ind

Re: streaming job reading from kafka stuck while cancelling

2016-03-09 Thread Maciek Próchniak
Thanks, that makes sense... Guess I'll try some dirty workaround for now by interrupting consumer thread if it's doesn't finish after some time... maciek On 09/03/2016 14:42, Stephan Ewen wrote: Here is the Jira issue: https://issues.apache.org/jira/browse/FLINK-3595 On Wed, Mar 9, 2016 at

Re: time spent for iteration

2016-03-09 Thread Gábor Gévay
Yes, I also think that it would be a nice feature. It would make the advantage of delta iterations (that later iterations take less time) more visible to the users. Best, Gábor 2016-03-09 15:25 GMT+01:00 Vasiliki Kalavri : > I think it would be useful to allow for easier retrieval of this > info

Re: time spent for iteration

2016-03-09 Thread Vasiliki Kalavri
I think it would be useful to allow for easier retrieval of this information. Wouldn't it make sense to expose this to the web UI for example? We actually had a discussion about this some time ago [1]. -Vasia. [1]: https://issues.apache.org/jira/browse/FLINK-1759 On 9 March 2016 at 14:37, Gábor

Re: Submit Flink Jobs to YARN running on AWS

2016-03-09 Thread Stephan Ewen
Hi Abhi! You pretty much described it correctly: Flink binds its ports to the internal IP addresses, so you cannot send a message through the external IP addresses. Can you see if you can configure explicitly the external IP address as the JobManager hostname, so the JobManager will bind to that

Re: streaming job reading from kafka stuck while cancelling

2016-03-09 Thread Stephan Ewen
Here is the Jira issue: https://issues.apache.org/jira/browse/FLINK-3595 On Wed, Mar 9, 2016 at 2:06 PM, Stephan Ewen wrote: > Hi! > > Thanks for the debugging this, I think there is in fact an issue in the > 0.9 consumer. > > I'll open a ticket for it, will try to fix that as soon as possible..

Re: time spent for iteration

2016-03-09 Thread Gábor Gévay
Hello, If you increase the log level, you can see each step of the iteration separately in the log, with timestamps. Best, Gábor 2016-03-09 14:04 GMT+01:00 Riccardo Diomedi : > Is it possible to add timer for the time spent for iteration when iterate > operator or the delta iterate operator

Re: streaming job reading from kafka stuck while cancelling

2016-03-09 Thread Stephan Ewen
Hi! Thanks for the debugging this, I think there is in fact an issue in the 0.9 consumer. I'll open a ticket for it, will try to fix that as soon as possible... Stephan On Wed, Mar 9, 2016 at 1:59 PM, Maciek Próchniak wrote: > Hi, > > from time to time when we cancel streaming jobs (or they

time spent for iteration

2016-03-09 Thread Riccardo Diomedi
Is it possible to add timer for the time spent for iteration when iterate operator or the delta iterate operator is performed? thanks Riccardo

streaming job reading from kafka stuck while cancelling

2016-03-09 Thread Maciek Próchniak
Hi, from time to time when we cancel streaming jobs (or they are failing for some reason) we encounter: 2016-03-09 10:25:29,799 [Canceler for Source: read objects from topic: (...) ' did not react to cancelling signal, but is stuck in method: java.lang.Object.wait(Native Method) java.lang.T

Re: JobManager Dashboard and YARN execution

2016-03-09 Thread Stephan Ewen
Hi! Yes, the dashboard is available in both cases. It is proxied through the Yarn UI, you can find the link from there... You can always access JobManager logs through the UI. Stephan On Wed, Mar 9, 2016 at 12:47 PM, Andrea Sella wrote: > Hi, > I am experimenting the integration between Apac

Re: JobManager Dashboard and YARN execution

2016-03-09 Thread Maximilian Michels
Hi Andrea, The dashboard is available in both cases. It only shows the job manager logs. For the task manager, you will have to use the Yarn commands. Cheers, Max On Wed, Mar 9, 2016 at 12:47 PM, Andrea Sella wrote: > Hi, > I am experimenting the integration between Apache Flink and YARN. > > W

Re: keyBy using custom partitioner

2016-03-09 Thread madhu phatak
Hi, Thank you. On Mar 9, 2016 5:27 PM, "Stephan Ewen" wrote: > Hi! > > You can currently not override the hash function used by "keyBy()". The > reason is that this function is used in multiple places, for the stream > partitioning, and also for the partitioning of state. Both have to be > aligne

Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

2016-03-09 Thread Prez Cannady
Thanks. Need to dive in a bit better, but I did clarify some things in my mind which bear mentioning. 1. Sourcing JDBC data is not a streaming operation, but a batching one. Which makes sense, since you generally slurp rather than stream relational data, so within the constraints provided you

Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

2016-03-09 Thread Chesnay Schepler
now that i look back to my mail i may have given you the wrong idea about the prototype; to make sure we are on the same page: the only thing it enables is using the JDBCInputFormat without providing a separate TypeInformation. Still works with tuples, not POJO's. you can find the prototype her

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-09 Thread Aljoscha Krettek
Hi Maximilian, I’m currently running some tests again on a cluster to try and pinpoint the problem. Just to make sure, you are using Hadoop 2.4.1 with Yarn and Kafka 0.8, correct? In the meantime, could you maybe run a test where you completely bypass Kafka, just so we can see whether the probl

Re: keyBy using custom partitioner

2016-03-09 Thread Stephan Ewen
Hi! You can currently not override the hash function used by "keyBy()". The reason is that this function is used in multiple places, for the stream partitioning, and also for the partitioning of state. Both have to be aligned. What you can do is use "partitionCustom(...)" to use an arbitrary part

JobManager Dashboard and YARN execution

2016-03-09 Thread Andrea Sella
Hi, I am experimenting the integration between Apache Flink and YARN. When I am running a Flink job using yarn-cluster, is the Dashboard available to monitor the execution? Is it also available for long running session? Is it possibile retrieve logs from the dashboard or I have to pass via yarn a

Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

2016-03-09 Thread Prez Cannady
I suspected as much (the tuple size limitation). Creating my own InputFormat seems to be the best solution, but before i go down that rabbit hole I wanted to see at least a semi-trivial working example of JDBCInputFormat with Scala 2.11. I’d appreciate a look at that prototype if its publicly

Re: Attempting to refactor Kafka Example using Flink 0.10.2 with Scala 2.11

2016-03-09 Thread Prez Cannady
I’ll give that a shot, but I should also report that as of yesterday I’ve been able to get it to work with this configuration: https://github.com/OCExercise/kafka-example/tree/example-2.11-0.9.0.1 My primary blocking issue

Re: [ANNOUNCE] Flink 1.0.0 has been released

2016-03-09 Thread Stephan Ewen
@Igor: State from the CEP library (like what was the last observed event for a certain ID or so) is kept in Flink's state mechanisms (the key/value state) Have a look at these docs for details: - Working with State: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/state

Re: Attempting to refactor Kafka Example using Flink 0.10.2 with Scala 2.11

2016-03-09 Thread Stephan Ewen
Hi! Can you use version 1.0.0 for everything? That should make all dependencies consistent. Greetings, Stephan On Wed, Mar 9, 2016 at 11:39 AM, Maximilian Michels wrote: > Hi Prez, > > It appears Spring's Classloader is not set up correctly. > Unfortunately, I'm not familiar with the way Spri

Re: Attempting to refactor Kafka Example using Flink 0.10.2 with Scala 2.11

2016-03-09 Thread Maximilian Michels
Hi Prez, It appears Spring's Classloader is not set up correctly. Unfortunately, I'm not familiar with the way Springboot works. You added flink-connector-kafka-0.9_2.10 but also mentioned you're using Scala 2.11. That is bound to cause troubles :) Cheers, Max On Thu, Mar 3, 2016 at 8:02 PM, Pr

Re: protobuf messages from Kafka to elasticsearch using flink

2016-03-09 Thread Fabian Hueske
Hi, I haven't used protobuf to serialize Kafka events but this blog post (+ the linked repository) shows how to write data from Flink into Elasticsearch: --> https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana Hope this helps, Fabian

Re: [ANNOUNCE] Flink 1.0.0 has been released

2016-03-09 Thread Maximilian Michels
Great to have this out! @Radu: You may use this LinkedIn post: https://www.linkedin.com/groups/7414853/7414853-6113008761373290497 @Igor: Having a one month window should work fine. The CEP library only keeps track of the current state of the events which enables you to process large amounts of d

Re: ype of TypeVariable could not be determined

2016-03-09 Thread Timo Walther
I think your problem is that you declared "TupleEvent2" as a TypeVariable in your code but I think you want to use a class that you defined, right? If so this is the correct declaration: MySourceFunction implements SourceFunction On 09.03.2016 09:28, Wang Yangjun wrote: Hello, I think in th

keyBy using custom partitioner

2016-03-09 Thread madhu phatak
Hi, How to use a custom partitioner in keyBy operation? As of now it's using hash partitioner to load balance across parallel tasks. I tried custom partitioning the schema before calling keyBy operation. It doesn't seem to preserve that partition. -- Regards, Madhukara Phatak http://datamantra.io

Re: Flink and Directory Monitors

2016-03-09 Thread Fabian Hueske
Hi Philippe, I am not aware of anybody using Directory Monitor with Flink. However, the application you described sounds reasonable and I think it should be possible to implement that with Flink. You would need to implement a SourceFunction that forwards events from DM to Flink or you push the DM

Re: Submit Flink Jobs to YARN running on AWS

2016-03-09 Thread Fabian Hueske
Hi Abhi, I have used Flink on EMR via YARN a couple of times without problems. I started a Flink YARN session like this: ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096 This will start five YARN containers (1 JobManager with 1024MB, 4 Taskmanagers with 4096MB). See more config options in the docume

Re: ype of TypeVariable could not be determined

2016-03-09 Thread Wang Yangjun
Hello, I think in this case you could just implement your “MySourceFunction” both SourceFunction and ResultTypeQueryable interfaces. MySourceFunction implements SourceFunction, ResultTypeQueryable< TupleEvent2 > { … } Best, Jun From: Radu Tudoran mailto:radu.tudo...@huawei.com>> Reply-To: "u

Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

2016-03-09 Thread Chesnay Schepler
you can always create your own InputFormat, but there is no AbstractJDBCInputFormat if that's what you were looking for. When you say arbitrary tuple size, do you mean a) a size greater than 25, or b) tuples of different sizes? If a) unless you are fine with using nested tuples you won't get ar