Re: design question

2016-04-25 Thread Aljoscha Krettek
Hi, in the Flink doc there is this: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/state_backends.html#the-rocksdbstatebackend and this: RocksDBStateBackend

Re: Thanks everyone

2016-04-25 Thread Ufuk Celebi
Thanks for sharing Prez! :-) On Sat, Apr 23, 2016 at 7:08 AM, Márton Balassi wrote: > Hi Prez, > > Thanks for sharing, the community is always glad to welcome new Flink users. > > Best, > > Marton > > On Sat, Apr 23, 2016 at 6:01 AM, Prez Cannady > wrote: >> >> We’ve completed our first full swe

Re: Flink first() operator

2016-04-25 Thread Ufuk Celebi
Hey Biplob, Yes, the file source will read all input. The first operator will add a combiner to the source for pre-aggregation and then shuffle everything to a single reduce instance, which emits the N first elements. Keep in mind that there is no strict order in which the records will be emitted.

Custom Trigger Implementation

2016-04-25 Thread Piyush Shrivastava
Hi all,I want to implement a custom Trigger which fired a GlobalWindow in 1 minute for the first time and every 20 seconds after that.I believe I cannot get this logic right in the implementation of my custom Trigger. Please help me with this. Here is the code of my custom Trigger: public class

Re: Clear irrelevant state values

2016-04-25 Thread Gyula Fóra
Hi, (a) I think your understanding is correct, one consideration might be that if you are always sending the state to the sink, it might make sense to build it there directly using a RichSinkFunction. (b) There is no built-in support for this at the moment. What you can do yourself is to generate

Re: Getting java.lang.Exception when try to fetch data from Kafka

2016-04-25 Thread Robert Metzger
Hi Prateek, were the messages written to the Kafka topic by Flink, using the TypeInformationKeyValueSerializationSchema ? If not, maybe the Flink deserializers expect a different data format of the messages in the topic. How are the messages written into the topic? On Fri, Apr 22, 2016 at 10:21

Re: Custom Trigger Implementation

2016-04-25 Thread Kostas Kloudas
Hi Piyush, In the onElement function, you register a timer every time you receive an element. When the next watermark arrives, in the flag==false case, this will lead to every element adding a timer for its timestamp+6ms. The same for flag==true case, with 2ms interval. What you can

Re: Flink first() operator

2016-04-25 Thread Fabian Hueske
Hi Biplop, you can also implement a generic IF that wraps another IF (such as a CsvInputFormat). The wrapping IF forwards all calls to the wrapped IF and in addition counts how many records were emitted (how often InputFormat.nextRecord() was called). Once the count arrives at the threshold, it re

Re: Custom Trigger Implementation

2016-04-25 Thread Kostas Kloudas
Hi, Let me also add that you should also override the clear() method in order to clear you state. and delete the pending timers. Kostas > On Apr 25, 2016, at 11:52 AM, Kostas Kloudas > wrote: > > Hi Piyush, > > In the onElement function, you register a timer every time you receive an > ele

Re: YARN terminating TaskNode

2016-04-25 Thread Robert Metzger
Hi Timur, The reason why we only allocate 570mb for the heap is because you are allocating most of the memory as off heap (direct byte buffers). In theory, the memory footprint of the JVM is limited to 570 (heap) + 1900 (direct mem) = 2470 MB (which is below 2500). But in practice thje JVM is all

Re: Custom Trigger Implementation

2016-04-25 Thread Piyush Shrivastava
Thanks a lot Kostas. This solved my problem. Thanks and Regards,Piyush Shrivastava http://webograffiti.com On Monday, 25 April 2016 3:27 PM, Kostas Kloudas wrote: Hi, Let me also add that you should also override the clear() method in order to clear you state.and delete the pending t

Re: Access to a shared resource within a mapper

2016-04-25 Thread Fabian Hueske
Hi Timur, a TaskManager may run as many subtasks of a Map operator as it has slots. Each subtask of an operator runs in a different thread. Each parallel subtask of a Map operator has its own MapFunction object, so it should be possible to use a lazy val. However, you should not use static variab

Re: YARN terminating TaskNode

2016-04-25 Thread Maximilian Michels
Hi Timur, Which version of Flink are you using? Could you share the entire logs? Thanks, Max On Mon, Apr 25, 2016 at 12:05 PM, Robert Metzger wrote: > Hi Timur, > > The reason why we only allocate 570mb for the heap is because you are > allocating most of the memory as off heap (direct byte buf

Re: Clear irrelevant state values

2016-04-25 Thread Sowmya Vallabhajosyula
Hi Gyula, Thank you so much. 1. Can you point me to any documentation on removal markers? 2. My understanding is this implementation of custom state maintenance does not impact scalabiity. Is that right? Thanks, Sowmya On Mon, Apr 25, 2016 at 3:06 PM, Gyula Fóra wrote: > Hi, > > (a) I think y

Re: Flink on Yarn - ApplicationMaster command

2016-04-25 Thread Maximilian Michels
Great to hear! :) On Sun, Apr 24, 2016 at 3:51 PM, Theofilos Kakantousis wrote: > Hi, > > The issue was a mismatch of jar versions on my client. Seems to be working > fine now. > Thanks again for your help! > > Cheers, > Theofilos > > > On 2016-04-22 18:22, Theofilos Kakantousis wrote: > > Hi Max

Re: Clear irrelevant state values

2016-04-25 Thread Gyula Fóra
Hi, The removal markers are just something I made up :) What I meant is that you can generate events in a custom source for instance that will trigger the removal of the state. This might be easy or hard to do depending on your use-case. What do you mean by custom state maintenance? As long as yo

Re: Control triggering on empty window

2016-04-25 Thread Aljoscha Krettek
Hi, I'm sorry but I still think this is not possible. Windows are usually associated with a key, so if there is no element to which we can assign a window then there is also no key to which the window would belong. Cheers, Aljoscha On Thu, 21 Apr 2016 at 22:35 Maxim wrote: > I think the best wa

Re: Clear irrelevant state values

2016-04-25 Thread Sowmya Vallabhajosyula
Thanks Gyula. Yes, I am using state only in RichFlatMapFunction. Will try to evaluate generating events for removal of state. Regards, Sowmya On Mon, Apr 25, 2016 at 5:44 PM, Gyula Fóra wrote: > Hi, > > The removal markers are just something I made up :) What I meant is that > you can generate

Re: AvroWriter for Rolling sink

2016-04-25 Thread Aljoscha Krettek
Hi, the code looks very good! Do you think it can be adapted to the slightly modified interface introduced here: https://issues.apache.org/jira/browse/FLINK-3637 It basically requires the writer to know the write position, so that we can truncate to a valid position in case of failure. Cheers, Al

Re: Count windows missing last elements?

2016-04-25 Thread Aljoscha Krettek
Yes, this looks correct for a Counting Trigger that flushes when the sources finish. Could you also solve your filtering problem with this or is this still an open issue? Cheers, Aljoscha On Sat, 23 Apr 2016 at 16:57 Konstantin Kulagin wrote: > I finally was able to do that. Kinda ugly, but wor

Re: Custom Trigger Implementation

2016-04-25 Thread Kostas Kloudas
Good to hear that! Kostas > On Apr 25, 2016, at 12:24 PM, Piyush Shrivastava > wrote: > > Thanks a lot Kostas. This solved my problem. > > Thanks and Regards, > Piyush Shrivastava > > http://webograffiti.com > > > On Monday, 25 A

Re: General Data questions - streams vs batch

2016-04-25 Thread Aljoscha Krettek
Hi, I'll try and answer your questions separately. First, a general remark, although Flink has the DataSet API for batch processing and the DataStream API for stream processing we only have one underlying streaming execution engine that is used for both. Now, regarding the questions: 1) What do yo

Re: YARN terminating TaskNode

2016-04-25 Thread Timur Fayruzov
Hello Maximilian, I'm using 1.0.0 compiled with Scala 2.11 and Hadoop 2.7. I'm running this on EMR. I didn't see any exceptions in other logs. What are the logs you are interested in? Thanks, Timur On Mon, Apr 25, 2016 at 3:44 AM, Maximilian Michels wrote: > Hi Timur, > > Which version of Flin

Re: Access to a shared resource within a mapper

2016-04-25 Thread Timur Fayruzov
Hi Fabian, I didn't realize you meant that lazy val should be inside RichMapFunction implementation, it makes sense. That's what I ended up doing already. Thanks! Timur On Mon, Apr 25, 2016 at 3:34 AM, Fabian Hueske wrote: > Hi Timur, > > a TaskManager may run as many subtasks of a Map operato

Re: "No more bytes left" at deserialization

2016-04-25 Thread Timur Fayruzov
Still trying to resolve this serialization issue. I was able to hack it by 'serializing' `Record` to String and then 'deserializing' it in coGroup, but boy its so ugly. So the bug is that it can't deserialize the case class that has the structure (slightly different and more detailed than I stated

Re: General Data questions - streams vs batch

2016-04-25 Thread Konstantin Kulagin
As usual - thanks for answers, Aljoscha! I think I understood what I want to know. 1) To add few comments: about streams I was thinking about something similar to Storm where you can have one Source and 'duplicate' the same entry going through different 'path's. Something like this: https://docs.

Re: Count windows missing last elements?

2016-04-25 Thread Konstantin Kulagin
Thanks! Now I can call myself a super flink developer :) As for the issue - I am still trying to figure out ways to do that. I've raised a question in this thread: http://mail-archives.apache.org/mod_mbox/flink-user/201604.mbox/%3CCAJ746X69L%2ByARu3pq74peov1TxyfUPhtQWg3ffLJ5SQk4OmTAg%40mail.gmai

Re: YARN terminating TaskNode

2016-04-25 Thread Maximilian Michels
Hi Timur, Shedding some light on the memory calculation: You have a total memory size of 2500 MB for each TaskManager. The default for 'taskmanager.memory.fraction' is 0.7. This is the fraction of the memory used by the memory manager. When you have turned on off-heap memory, this memory is alloc

Re: Getting java.lang.Exception when try to fetch data from Kafka

2016-04-25 Thread prateekarora
Hi I have java program that sending data into kafka topic using kafa client API (0.8.2) here is sample to code using to send data in kafka topic : import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.ProducerConfig; import org.apache.kafka.clients.producer

Re: YARN terminating TaskNode

2016-04-25 Thread Timur Fayruzov
Great answer, thanks you Max for a very detailed explanation! Illuminating how off-heap parameter affects the memory allocation. I read this post: https://blogs.oracle.com/jrockit/entry/why_is_my_jvm_process_larger_t and the thing that jumped on me is the allocation of memory for jni libs. I do u

Re: AvroWriter for Rolling sink

2016-04-25 Thread Igor Berman
Hi, it's not a problem, I'll find time to change it(I understand the refactoring is in master and not released yet). Wanted to ask if it's acceptable to add following dependency to flink? I mean my code reused code in this jar(pay attention it's not present currently in flink classpath) org.apache

Re: Getting java.lang.Exception when try to fetch data from Kafka

2016-04-25 Thread prateekarora
Hi I have java program to send data into kafka topic. below is code for this : private Producer producer = null Serializer keySerializer = new StringSerializer(); Serializer valueSerializer = new ByteArraySerializer(); producer = new KafkaProducer(props, keySerializer, valueSerializer); Produce

Re: Submit Flink Jobs to YARN running on AWS

2016-04-25 Thread Bajaj, Abhinav
Hi Fabian, Thanks for your reply and the pointers to documentation. In these steps, I think the Flink client is installed on the master node, referring to steps mentioned in Flink docs here. However, the scenario I have

Re: Join DataStream with dimension tables?

2016-04-25 Thread Srikanth
Aljoscha, Looks like a potential solution. Feels a bit hacky though. Didn't quite understand why a list backed store is used to for static input buffer? Join(inner) should emit only one record if there is a key match. Is it a property of the system to emit Long.MAX_VALUE watermark when a finite

Job hangs

2016-04-25 Thread Timur Fayruzov
Hello, Now I'm at the stage where my job seem to completely hang. Source code is attached (it won't compile but I think gives a very good idea of what happens). Unfortunately I can't provide the datasets. Most of them are about 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB m

Re: Flink program without a line of code

2016-04-25 Thread Alexander Smirnov
thank you so much for the responses, guys! On Sat, Apr 23, 2016 at 12:09 AM Flavio Pompermaier wrote: > Hi Alexander, > since I was looking for something similar some days ago here is what I > know about this argument: > during the Stratosphere project there was Meteor and Supremo allowing that

convert Json to Tuple

2016-04-25 Thread Alexander Smirnov
Hello everybody! my RMQSource function receives string with JSONs in it. Because many operations in Flink rely on Tuple operations, I think it is a good idea to convert JSON to Tuple. I believe this task has been solved already :) what's the common approach for this conversion? Thank you, Alex

Re: convert Json to Tuple

2016-04-25 Thread Timur Fayruzov
Normally, Json4s or Jackson+scala plugin work well for json to scala data structure conversions. However, I would not expect they support a special case for tuples, since JSON key-value fields would normally convert to case classes and JSON arrays are converted to, well, arrays. That's being said,

Re: convert Json to Tuple

2016-04-25 Thread Alexander Smirnov
Thanks Timur! I should have mentioned, I need it for Java On Mon, Apr 25, 2016 at 10:13 PM Timur Fayruzov wrote: > Normally, Json4s or Jackson+scala plugin work well for json to scala data > structure conversions. However, I would not expect they support a special > case for tuples, since JSON