Re: Memory Issue

2017-08-24 Thread Govindarajan Srinivasaraghavan
Thanks Stephan, any pointers on how managed memory is used in streaming application will really help. Regards, Govind > On Aug 24, 2017, at 1:53 AM, Stephan Ewen wrote: > > Hi! > > RocksDB will be used when it is selected as the state backend, independent of > the

Even out the number of generated windows

2017-08-24 Thread Bowen Li
Hi guys, I do have a question for how Flink generates windows. We are using a 1-day sized sliding window with 1-hour slide to count some features of items based on event time. We have about 20million items. We observed that Flink only emit results on a fixed time in an hour (e.g. 1am, 2am, 3am,

Re: Flink doesn't free YARN slots after restarting

2017-08-24 Thread Bowen Li
Hi Till, Thank you very much for looking into it! According to our investigation, this is indeed a Kinesis issue. Flink (FlinkKinesisProducer) uses KPL(Kinesis Producer Library), but hasn't tune it up yet. I have identified a bunch of issues, opened the following Flink tickets, and are working on

Re: custom writer fail to recover

2017-08-24 Thread Biswajit Das
Hi Stefan , My bad , I'm really sorry. I have copied wrong exception stack , during the recovery after error I'm seeing below exception Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.HadoopIllegalArgumentException): Cannot truncate to a larger file size. Current size:

Security Control of running Flink Jobs on Flink UI

2017-08-24 Thread Raja . Aravapalli
Hi, I have started a Flink session/cluster on a existing Hadoop Yarn Cluster using Flink Yarn-Session, and submitting Flink streaming jobs to it… and everything works fine. But, one problem I see with this approach is: The Flink Yarn-Session is running with a yarn application id. And this

Re: Database connection from job

2017-08-24 Thread Stefan Richter
Hi, the lifecycle is described here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/task_lifecycle.html Best, Stefan > Am 24.08.2017 um 14:12 schrieb Bart Kastermans

Re: Aggregating metrics from different boxes

2017-08-24 Thread Chesnay Schepler
If you want to change under what identifier metrics are exported please have a look at scope formats in the flink documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#scope If i understood you correctly the goal would be to remove the host and

Flink-HTM integration

2017-08-24 Thread AndreaKinn
Hi, Is there here someone who used Flink-HTM library https://github.com/htm-community/flink-htm ? I'm trying to implement it in my project but I have some fundamental question to complete my thesis work. Regards, Andrea -- View this message in

Re: [Error]TaskManager -RECEIVED SIGNAL 1: SIGHUP. Shutting down as requested

2017-08-24 Thread Ted Yu
Can you provide more information ? OS version Flink version Anything interesting in dmesg output around this time ? On Thu, Aug 24, 2017 at 4:53 AM, Samim Ahmed wrote: > Hi All, > > From last two days I am getting below error and the worker server are > killed with below

Question about watermark and window

2017-08-24 Thread Tony Wei
Hi, Recently, I studied about watermark from Flink documents and blogs. I have some question about this scenario below. Suppose there are five clients sending events with different time to the topic on Kafka. Topic has two partitions and five events' timestamp are (ts=1), (ts=2), (ts=3),

RE: Flink parquet read.write performance

2017-08-24 Thread Newport, Billy
We saw the function wrapping when we were debugging it and that’s what surprised us when it suddenly serialized rather than called the writer and physically wrote the records in a separate jvm. From: Aljoscha Krettek [mailto:aljos...@apache.org] Sent: Wednesday, August 23, 2017 11:51 AM To:

RE: Flink parquet read.write performance

2017-08-24 Thread Newport, Billy
If we use two sinks with the same folder then we get file name collisions between the two sinks. It sounds like even if we did that, flink isn’t capable of chaining it regardless, no? We find ourselves having to manually optimize the data flow quite a bit to tell you the truth. For example:

Database connection from job

2017-08-24 Thread Bart Kastermans
I am using the scala api for Flink, and am trying to set up a JDBC database connection in my job (on every incoming event I want to query the database to get some data to enrich the event). Because of the serialization and deserialization of the code as it is send from the flink master to the

[Error]TaskManager -RECEIVED SIGNAL 1: SIGHUP. Shutting down as requested

2017-08-24 Thread Samim Ahmed
Hi All, >From last two days I am getting below error and the worker server are killed with below errors. Please guide me how to fix this error . Earlier Every thing was running smoothly. I am using the cluster mode with five slave server and one master configuration.. 2017-08-24 12:24:02,594

Re: custom writer fail to recover

2017-08-24 Thread Stefan Richter
Hi, I think there are two different things mixed up in your analysis. The stack trace that you provided is caused by a failing checkpoint - in writing, not in reading. It seems to fail from a Timeout of your HDFS connection. This close method has also nothing to do with the close method in the

Re: Memory Issue

2017-08-24 Thread Stephan Ewen
Hi! RocksDB will be used when it is selected as the state backend, independent of the checkpointing configuration. Using RocksDB as the state backend, Flink will have some objects on the heap, like timers (we will move them to RocksDB as well in the near future) but the majority will be off

Aggregating metrics from different boxes

2017-08-24 Thread Sridhar Chellappa
Folks, I am using RichMapFunction to generate codahale like metrics from different taskmanagers spread across an N-Node cluster. When I see the visualizations (Grafana on InfluxDB), I see all of the metrics as separate streams ($host.$taskmanager.$uuid.$metricname). I thought I can aggregate

Re: Expception with Avro Serialization on RocksDBStateBackend

2017-08-24 Thread Biplob Biswas
Hi Till, Thanks for the response. I was assuming that the Avro Serializer will create a corresponding Avro schema with the Object class I provide. In that respect, I did the following: AvroSerializer txnAvroSerde = new AvroSerializer<>(TransactionStateModel.class); ValueStateDescriptor