date:20170516

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-16 Thread Jia Teoh

Hi Gordon, The timestamps are required for application logic. Thank you for clarifying the custom operators - seems I mistakenly thought of the functions that are passed to the operators rather than the operators themselves. AbstractStreamOperator and the other classes you mentioned seem like

Re: Kafka 0.10.x event time with windowing

2017-05-16 Thread Tzu-Li (Gordon) Tai

Ah, my apologies, some misunderstanding on my side here. FlinkKafkaConsumer010 attaches the Kafka timestamp with the records, hence a timestamp extractor is not required, BUT you’ll still need a watermark generator to produce the watermarks. That should explain why the windows aren’t firing.

Re: Kafka 0.10.x event time with windowing

2017-05-16 Thread Jia Teoh

Hi Gordon, Thanks for confirming my understanding that the extractor should not have to be defined for 0.10. However, I'm still experiencing the case where not using an extractor results in zero window triggers. I've verified the timestamps in the Kafka records with the following command:

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-16 Thread Tzu-Li (Gordon) Tai

Hi Jia, How exactly do you want to use the Kafka timestamps? Do you want to access them and alter them with new values as the record timestamp? Or do you want to use them for some application logic in your functions? If its the former, you should be able to do that by using timestamp /

Re: Kafka 0.10.x event time with windowing

2017-05-16 Thread Tzu-Li (Gordon) Tai

Hi Jia! This sounds a bit fishy. The docs mention that there is no need for a timestamp / watermark extractor because with 0.10, the timestamps that come with Kafka records can be used directly to produce watermarks for event time. One quick clarification: did you also check whether the

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread rizhashmi

thanks Chesnay Schepler -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Group-already-contains-a-Metric-with-the-name-latency-tp13157p13178.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

State in Custom Tumble Window Class

2017-05-16 Thread rizhashmi

i have requirement not to reject events .. even if they are late(after maximum allowedness). So the way i achieve this but overriding Tumbling window class and update event time to last event time if the event is late and for identification attached additional column in db as

Kafka 0.10.x event time with windowing

2017-05-16 Thread Jia Teoh

Hi, I'm trying to use KafkaConsumer010 as a source for a windowing job on event time, as provided by Kafka. According to the kafka connector doc (link ),

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-16 Thread Jia Teoh

Hi Robert, Thanks for the reply. I ended up implementing an extension of the Kafka fetcher and consumer so that the deserialization API can include the timestamp field, which is sufficient for my specific use case. I can share the code if desired but it seems like it's an intentional design

Re: Sink - Cassandra

2017-05-16 Thread Nick Dimiduk

Yes the approach works fine for my use-case; has been in "production" for quite some time. My implementation has some untested scenarios around job restarting and failures, so of course your mileage may vary. On Mon, May 15, 2017 at 5:58 AM, nragon wrote: >

Re: Stateful streaming question

2017-05-16 Thread Jain, Ankit

Hi Flavio, While you wait on an update from Kostas, wanted to understand the use-case better and share my thoughts- 1) Why is current batch mode expensive? Where are you persisting the data after updates? Way I see it by moving to Flink, you get to use RocksDB(a key-value store) that

Re: questions about Flink's HashJoin performance

2017-05-16 Thread Stephan Ewen

Hi! Be aware that the "Row" and "Record" types are not very high performance data types. You might be measuring the data type overhead, rather than the hash table performance. Also, the build measurements include the data generation, which influences the results. If you want to purely benchmark

Re: Stateful streaming question

2017-05-16 Thread Flavio Pompermaier

Hi Kostas, thanks for your quick response. I also thought about using Async IO, I just need to figure out how to correctly handle parallelism and number of async requests. However that's probably the way to go..is it possible also to set a number of retry attempts/backoff when the async request

FlinkCEP latency/throughput

2017-05-16 Thread Sonex

Hello everyone, I am testing some patterns with FlinkCEP and I want to measure latency and throughput when using 1 or more processing cores. How can I do that ?? What I have done so far: Latency: Each time an event arrives I store the system time (System.currentTimeMillis). When flink calls the

Re: Stateful streaming question

2017-05-16 Thread Kostas Kloudas

Hi Flavio, From what I understand, for the first part you are correct. You can use Flink’s internal state to keep your enriched data. In fact, if you are also querying an external system to enrich your data, it is worth looking at the AsyncIO feature:

Stateful streaming question

2017-05-16 Thread Flavio Pompermaier

Hi to all, we're still playing with Flink streaming part in order to see whether it can improve our current batch pipeline. At the moment, we have a job that translate incoming data (as Row) into Tuple4, groups them together by the first field and persist the result to disk (using a thrift

Re: Problem with Kafka Consumer

2017-05-16 Thread Kostas Kloudas

Hi Simone, Glad I could help ;) Actually it would be great if you could also try out the upcoming (not yet released) 1.3 version and let us know if you find something that does not work as expected. We are currently in the phase of testing it, as you may have noticed, and every contribution

Re: Problem with Kafka Consumer

2017-05-16 Thread simone

Hi Kostas, thanks for your suggestion. Indeed, replacing my custom sink with a simpler one problem bring out that the cause of the problem was RowToQuery as you suggested. The sink was blocking the reads making the Kafka pipeline stall, due to a misconfiguration of an internal client that is

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread Chesnay Schepler

No; you can name operators like this: stream.map().name("MyUniqueMapFunctionName") On 16.05.2017 14:50, rizhashmi wrote: thanks for your reply *latency* metrics appear to be pushed by AbstractStreamOperator.java latencyGauge = this.metrics.gauge("latency", new LatencyGauge(historySize));

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread rizhashmi

thanks for your reply *latency* metrics appear to be pushed by AbstractStreamOperator.java latencyGauge = this.metrics.gauge("latency", new LatencyGauge(historySize)); does it mean this methods need to be override? -- View this message in context:

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread Chesnay Schepler

Generally this isn't an issue; it only means that for some operators the latency metrics will not be available. The underlying issue is that the metric system has no way to differentiate operators except by their name; if the names are identical you end up with a collision. If you're not

Re: UnknownKvStateKeyGroupLocation

2017-05-16 Thread Ufuk Celebi

Hey Joe! This sounds odd... are there any failures (JobManager or TaskManager) or leader elections being reported? You should see such events in the JobManager/TaskManager logs. On Tue, May 16, 2017 at 2:28 PM, Joe Olson wrote: > When running Flink in high availability mode,

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread rizhashmi

i think i do have. In my implementation i am generating rollup for multiple timezone. So i took path of creating windows per timezone, for each window separate instance of trigger created window with a new instance AssignerWithPunctuatedWatermarks & each time i applied same filter. Does this

UnknownKvStateKeyGroupLocation

2017-05-16 Thread Joe Olson

When running Flink in high availability mode, I've been seeing a high number of UnknownKvStateKeyGroupLocation errors being returned when using queryable state calls. If I put a simple getKvState call into a loop executing every second, and call it repeatedly, sometimes I will get the

Re: Group already contains a Metric with the name 'latency'

2017-05-16 Thread Chesnay Schepler

Does your job include multiple operators called "Filter"? On 16.05.2017 13:35, rizhashmi wrote: I am getting bunch of warning in log files. Anyone help me sort out this problem. 2017-04-28 00:20:57,947 WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already contains a

Re: Problem with Kafka Consumer

2017-05-16 Thread Kostas Kloudas

Hi Simone, I suppose that you use messageStream.keyBy(…).window(…) right? .windowAll() is not applicable to keyedStreams. Some follow up questions are: In your logs, do you see any error messages? What does your RowToQuery() sink do? Can it be that it blocks and the back pressure makes all

Group already contains a Metric with the name 'latency'

2017-05-16 Thread rizhashmi

I am getting bunch of warning in log files. Anyone help me sort out this problem. 2017-04-28 00:20:57,947 WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already contains a Metric with the name 'latency'. Metric will not be

Problem with Kafka Consumer

2017-05-16 Thread simone

Hi to all, I have a problem with Flink and Kafka queues. I have a Producer that puts some Rows into a data Sink represented by a kafka queue and a Consumer that reads from this sink and process Rows in buckets of *N* elements using custom trigger function /messageStream.keyBy(0)// //

Re: questions about Flink's HashJoin performance

2017-05-16 Thread Fabian Hueske

Hi, Flink's HashJoin implementation was designed to gracefully handle inputs that exceed the main memory. It is not explicitly optimized for in-memory processing and does not play fancy tricks like optimizing cache accesses or batching. I assume your benchmark is about in-memory joins only. This

Re: Thrift object serialization

2017-05-16 Thread Flavio Pompermaier

Ok thanks Gordon! It would be nice to have a benchmark also on this ;) Thanks a lot for the support, Flavio On Tue, May 16, 2017 at 9:41 AM, Tzu-Li (Gordon) Tai wrote: > If you don’t register the TBaseSerializer for your MyThriftObj (or in > general don’t register any

Re: Thrift object serialization

2017-05-16 Thread Tzu-Li (Gordon) Tai

If you don’t register the TBaseSerializer for your MyThriftObj (or in general don’t register any serializer for the Thrift class), I think Kryo’s default FieldSerializer will be used for it. The TBaseSerializer basically just uses TBase for de-/serialization as you normally would for the

Re: Thrift object serialization

2017-05-16 Thread Flavio Pompermaier

Hi Gordon, thanks for the link. Will the usage ofTBaseSerializer wrt Kryo lead to a performance gain? On Tue, May 16, 2017 at 7:32 AM, Tzu-Li (Gordon) Tai wrote: > Hi Flavio! > > I believe [1] has what you are looking for. Have you taken a look at that? > > Cheers, > Gordon

Re: High Availability on Yarn

2017-05-16 Thread Jain, Ankit

Bringing it back to list’s focus. From: "Jain, Ankit" Date: Thursday, May 11, 2017 at 1:19 PM To: Stephan Ewen , "user@flink.apache.org" Subject: Re: High Availability on Yarn Got the answer on #2, looks like that will work, still

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

Re: Kafka 0.10.x event time with windowing

Re: Kafka 0.10.x event time with windowing

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

Re: Kafka 0.10.x event time with windowing

Re: Group already contains a Metric with the name 'latency'

State in Custom Tumble Window Class

Kafka 0.10.x event time with windowing

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

Re: Sink - Cassandra

Re: Stateful streaming question

Re: questions about Flink's HashJoin performance

Re: Stateful streaming question

FlinkCEP latency/throughput

Re: Stateful streaming question

Stateful streaming question

Re: Problem with Kafka Consumer

Re: Problem with Kafka Consumer

Re: Group already contains a Metric with the name 'latency'

Re: Group already contains a Metric with the name 'latency'

Re: Group already contains a Metric with the name 'latency'

Re: UnknownKvStateKeyGroupLocation

Re: Group already contains a Metric with the name 'latency'

UnknownKvStateKeyGroupLocation

Re: Group already contains a Metric with the name 'latency'

Re: Problem with Kafka Consumer

Group already contains a Metric with the name 'latency'

Problem with Kafka Consumer

Re: questions about Flink's HashJoin performance

Re: Thrift object serialization

Re: Thrift object serialization

Re: Thrift object serialization

Re: High Availability on Yarn

33 matches

Site Navigation

Mail list logo

Footer information