Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-16 Thread Jia Teoh
Hi Gordon, The timestamps are required for application logic. Thank you for clarifying the custom operators - seems I mistakenly thought of the functions that are passed to the operators rather than the operators themselves. AbstractStreamOperator and the other classes you mentioned seem like

Re: Kafka 0.10.x event time with windowing

2017-05-16 Thread Tzu-Li (Gordon) Tai
Ah, my apologies, some misunderstanding on my side here. FlinkKafkaConsumer010 attaches the Kafka timestamp with the records, hence a timestamp extractor is not required, BUT you’ll still need a watermark generator to produce the watermarks. That should explain why the windows aren’t firing.

Re: Kafka 0.10.x event time with windowing

2017-05-16 Thread Jia Teoh
Hi Gordon, Thanks for confirming my understanding that the extractor should not have to be defined for 0.10. However, I'm still experiencing the case where not using an extractor results in zero window triggers. I've verified the timestamps in the Kafka records with the following command:

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-16 Thread Tzu-Li (Gordon) Tai
Hi Jia, How exactly do you want to use the Kafka timestamps? Do you want to access them and alter them with new values as the record timestamp? Or do you want to use them for some application logic in your functions? If its the former, you should be able to do that by using timestamp /

Re: Kafka 0.10.x event time with windowing

2017-05-16 Thread Tzu-Li (Gordon) Tai
Hi Jia! This sounds a bit fishy. The docs mention that there is no need for a timestamp / watermark extractor because with 0.10, the timestamps that come with Kafka records can be used directly to produce watermarks for event time. One quick clarification: did you also check whether the

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread rizhashmi
thanks Chesnay Schepler -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Group-already-contains-a-Metric-with-the-name-latency-tp13157p13178.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

State in Custom Tumble Window Class

2017-05-16 Thread rizhashmi
i have requirement not to reject events .. even if they are late(after maximum allowedness). So the way i achieve this but overriding Tumbling window class and update event time to last event time if the event is late and for identification attached additional column in db as

Kafka 0.10.x event time with windowing

2017-05-16 Thread Jia Teoh
Hi, I'm trying to use KafkaConsumer010 as a source for a windowing job on event time, as provided by Kafka. According to the kafka connector doc (link ),

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-16 Thread Jia Teoh
Hi Robert, Thanks for the reply. I ended up implementing an extension of the Kafka fetcher and consumer so that the deserialization API can include the timestamp field, which is sufficient for my specific use case. I can share the code if desired but it seems like it's an intentional design

Re: Sink - Cassandra

2017-05-16 Thread Nick Dimiduk
Yes the approach works fine for my use-case; has been in "production" for quite some time. My implementation has some untested scenarios around job restarting and failures, so of course your mileage may vary. On Mon, May 15, 2017 at 5:58 AM, nragon wrote: >

Re: Stateful streaming question

2017-05-16 Thread Jain, Ankit
Hi Flavio, While you wait on an update from Kostas, wanted to understand the use-case better and share my thoughts- 1) Why is current batch mode expensive? Where are you persisting the data after updates? Way I see it by moving to Flink, you get to use RocksDB(a key-value store) that

Re: questions about Flink's HashJoin performance

2017-05-16 Thread Stephan Ewen
Hi! Be aware that the "Row" and "Record" types are not very high performance data types. You might be measuring the data type overhead, rather than the hash table performance. Also, the build measurements include the data generation, which influences the results. If you want to purely benchmark

Re: Stateful streaming question

2017-05-16 Thread Flavio Pompermaier
Hi Kostas, thanks for your quick response. I also thought about using Async IO, I just need to figure out how to correctly handle parallelism and number of async requests. However that's probably the way to go..is it possible also to set a number of retry attempts/backoff when the async request

FlinkCEP latency/throughput

2017-05-16 Thread Sonex
Hello everyone, I am testing some patterns with FlinkCEP and I want to measure latency and throughput when using 1 or more processing cores. How can I do that ?? What I have done so far: Latency: Each time an event arrives I store the system time (System.currentTimeMillis). When flink calls the

Re: Stateful streaming question

2017-05-16 Thread Kostas Kloudas
Hi Flavio, From what I understand, for the first part you are correct. You can use Flink’s internal state to keep your enriched data. In fact, if you are also querying an external system to enrich your data, it is worth looking at the AsyncIO feature:

Stateful streaming question

2017-05-16 Thread Flavio Pompermaier
Hi to all, we're still playing with Flink streaming part in order to see whether it can improve our current batch pipeline. At the moment, we have a job that translate incoming data (as Row) into Tuple4, groups them together by the first field and persist the result to disk (using a thrift

Re: Problem with Kafka Consumer

2017-05-16 Thread Kostas Kloudas
Hi Simone, Glad I could help ;) Actually it would be great if you could also try out the upcoming (not yet released) 1.3 version and let us know if you find something that does not work as expected. We are currently in the phase of testing it, as you may have noticed, and every contribution

Re: Problem with Kafka Consumer

2017-05-16 Thread simone
Hi Kostas, thanks for your suggestion. Indeed, replacing my custom sink with a simpler one problem bring out that the cause of the problem was RowToQuery as you suggested. The sink was blocking the reads making the Kafka pipeline stall, due to a misconfiguration of an internal client that is

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread Chesnay Schepler
No; you can name operators like this: stream.map().name("MyUniqueMapFunctionName") On 16.05.2017 14:50, rizhashmi wrote: thanks for your reply *latency* metrics appear to be pushed by AbstractStreamOperator.java latencyGauge = this.metrics.gauge("latency", new LatencyGauge(historySize));

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread rizhashmi
thanks for your reply *latency* metrics appear to be pushed by AbstractStreamOperator.java latencyGauge = this.metrics.gauge("latency", new LatencyGauge(historySize)); does it mean this methods need to be override? -- View this message in context:

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread Chesnay Schepler
Generally this isn't an issue; it only means that for some operators the latency metrics will not be available. The underlying issue is that the metric system has no way to differentiate operators except by their name; if the names are identical you end up with a collision. If you're not

Re: UnknownKvStateKeyGroupLocation

2017-05-16 Thread Ufuk Celebi
Hey Joe! This sounds odd... are there any failures (JobManager or TaskManager) or leader elections being reported? You should see such events in the JobManager/TaskManager logs. On Tue, May 16, 2017 at 2:28 PM, Joe Olson wrote: > When running Flink in high availability mode,

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread rizhashmi
i think i do have. In my implementation i am generating rollup for multiple timezone. So i took path of creating windows per timezone, for each window separate instance of trigger created window with a new instance AssignerWithPunctuatedWatermarks & each time i applied same filter. Does this

UnknownKvStateKeyGroupLocation

2017-05-16 Thread Joe Olson
When running Flink in high availability mode, I've been seeing a high number of UnknownKvStateKeyGroupLocation errors being returned when using queryable state calls. If I put a simple getKvState call into a loop executing every second, and call it repeatedly, sometimes I will get the

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread Chesnay Schepler
Does your job include multiple operators called "Filter"? On 16.05.2017 13:35, rizhashmi wrote: I am getting bunch of warning in log files. Anyone help me sort out this problem. 2017-04-28 00:20:57,947 WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already contains a

Re: Problem with Kafka Consumer

2017-05-16 Thread Kostas Kloudas
Hi Simone, I suppose that you use messageStream.keyBy(…).window(…) right? .windowAll() is not applicable to keyedStreams. Some follow up questions are: In your logs, do you see any error messages? What does your RowToQuery() sink do? Can it be that it blocks and the back pressure makes all

Group already contains a Metric with the name 'latency'

2017-05-16 Thread rizhashmi
I am getting bunch of warning in log files. Anyone help me sort out this problem. 2017-04-28 00:20:57,947 WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already contains a Metric with the name 'latency'. Metric will not be

Problem with Kafka Consumer

2017-05-16 Thread simone
Hi to all, I have a problem with Flink and Kafka queues. I have a Producer that puts some Rows into a data Sink represented by a kafka queue and a Consumer that reads from this sink and process Rows in buckets of *N* elements using custom trigger function /messageStream.keyBy(0)// //

Re: questions about Flink's HashJoin performance

2017-05-16 Thread Fabian Hueske
Hi, Flink's HashJoin implementation was designed to gracefully handle inputs that exceed the main memory. It is not explicitly optimized for in-memory processing and does not play fancy tricks like optimizing cache accesses or batching. I assume your benchmark is about in-memory joins only. This

Re: Thrift object serialization

2017-05-16 Thread Flavio Pompermaier
Ok thanks Gordon! It would be nice to have a benchmark also on this ;) Thanks a lot for the support, Flavio On Tue, May 16, 2017 at 9:41 AM, Tzu-Li (Gordon) Tai wrote: > If you don’t register the TBaseSerializer for your MyThriftObj (or in > general don’t register any

Re: Thrift object serialization

2017-05-16 Thread Tzu-Li (Gordon) Tai
If you don’t register the TBaseSerializer for your MyThriftObj (or in general don’t register any serializer for the Thrift class), I think Kryo’s default FieldSerializer will be used for it. The TBaseSerializer basically just uses TBase for de-/serialization as you normally would for the

Re: Thrift object serialization

2017-05-16 Thread Flavio Pompermaier
Hi Gordon, thanks for the link. Will the usage ofTBaseSerializer wrt Kryo lead to a performance gain? On Tue, May 16, 2017 at 7:32 AM, Tzu-Li (Gordon) Tai wrote: > Hi Flavio! > > I believe [1] has what you are looking for. Have you taken a look at that? > > Cheers, > Gordon

Re: High Availability on Yarn

2017-05-16 Thread Jain, Ankit
Bringing it back to list’s focus. From: "Jain, Ankit" Date: Thursday, May 11, 2017 at 1:19 PM To: Stephan Ewen , "user@flink.apache.org" Subject: Re: High Availability on Yarn Got the answer on #2, looks like that will work, still