Re: Lot of data generated in out file

2018-04-15 Thread Ashish Attarde
Thanks Gordon for your reply. The out file mistry got resolved. Someone accidently, modified the POJO code on server that I work on to, and had put in println. Thank you for the information. I am experimenting with windowing to understand better and fit in my use case. Thanks -Ashish On Sun,

Re: Lot of data generated in out file

2018-04-15 Thread Tzu-Li (Gordon) Tai
Hi Ashish, I don't really see why there are outputs in the out file for the program you provided. Perhaps others could chime in here .. As for your second question regarding window outputs: Yes, subsequent window operators should definitely be doable in Flink. This is just a matter of multiple

Re: Watermark and multiple streams

2018-04-15 Thread Tzu-Li (Gordon) Tai
Hi, How are your registering your event time timers on processElement? If you are continuously registering them, and watermarks are correctly generated upstream, then the onTimer method should be invoked properly. For your 1-to-many case, I would assume that whenever a new key arrives (that

Re: Simulating Time-based and Count-based Custom Windows with ProcessFunction

2018-04-15 Thread Tzu-Li (Gordon) Tai
Hi Max! Before we jump into the custom ProcessFunction approach: Have you also checked out using the RocksDB state backend, and whether or not it is suitable for your use case? For state that would not fit into memory, that is usually the to-go state backend to use. If you’re sure a custom

Re: Trying to understand KafkaConsumer_records_lag_max

2018-04-15 Thread Tzu-Li (Gordon) Tai
Hi Julio, I'm not really sure, but do you think it is possible that there could be some hard data retention setting for your Kafka topics in the staging environment? As in, at some point in time and maybe periodically, all data in the Kafka topics are dropped and therefore the consumers

Re: Tiemrs and restore

2018-04-15 Thread Tzu-Li (Gordon) Tai
Hi Alberto, Looking at the code, I think the current behavior is that all timers (both processing time and event time) are re-registered on restore, and therefore should be triggered automatically. So, for processing time timers, on restore all timers that were supposed to be fired while the

User-defined aggregation function and parallelism

2018-04-15 Thread 杨力
I am running flink SQL in streaming mode and implemented a UDAGG, which is used in keyed HOP windows. But I found that the throughput decreases dramatically when the function is used. Does UDAGG run in parallell? Or does it run only in one thread? Regards, Bill

Re: data enrichment with SQL use case

2018-04-15 Thread Ken Krugler
If the SQL data is all (or mostly all) needed to join against the data from Kafka, then I might try a regular join. Otherwise it sounds like you want to use an AsyncFunction to do ad hoc queries (in parallel) against your SQL DB.

Re: Scaling down Graphite metrics

2018-04-15 Thread Chesnay Schepler
Hello, you can configure the rate at which metrics are reported by setting "metrics.reporter..interval" as described in the reporter documentation . At this time there is no way to disable specific

Re: Old Flink jobs restarting on Job Manager failover

2018-04-15 Thread Gary Yao
Hi Steve, What is the Flink version you are using? Jobs are recovered from metadata stored in ZooKeeper. The behavior you describe indicates that the submitted job graph is not deleted from ZooKeeper. By default, the jobs that should be running/recovered are stored in znode:

data enrichment with SQL use case

2018-04-15 Thread miki haiat
Hi, I have a case of meta data enrichment and im wondering if my approach is the correct way . 1. input stream from kafka. 2. MD in msSQL . 3. map to new pojo I need to extract a key from the kafka stream and use it to select some values from the sql table . SO i thought to use

Re: Issue in Flink/Zookeeper authentication via Kerberos

2018-04-15 Thread Eron Wright
I believe that the solution here is to ensure that the znodes created by Flink have an ACL that allows access only to the original creator. For example, if a given Flink job has a Kerberos identity of "us...@example.com", it should set the znode ACL appropriately to disallow access to any client

Tiemrs and restore

2018-04-15 Thread Alberto Mancini
Hello, according to this stackoverflow response https://stackoverflow.com/questions/36306136/will-apache-flink-restore-trigger-timers-after-failure IIUC we should expect that after a restore the timers will be not executed until a new timer is scheduled. I wonder if this is still true and if there

Re: Unsure how to further debug - operator threads stuck on java.lang.Thread.State: WAITING

2018-04-15 Thread Chesnay Schepler
Hello, Thread #1-3 are waiting for input, Thread #4 is waiting for the job to finish. To further debug this I would look into what the preceding operators are doing, whether they are blocked on something or are emitting records (which you can check in the UI/metrics). On 15.04.2018 18:40,

Unsure how to further debug - operator threads stuck on java.lang.Thread.State: WAITING

2018-04-15 Thread Miguel Coimbra
​Hello, I am running into a situation where the Flink threads responsible for my operator execution are all stuck on WAITING mode. Before anything else, this is my machine's spec: Linux 4.4.88 #1 SMP x86_64 Intel(R) Xeon(R) CPU E7- 4830 @ 2.13GHz GenuineIntel GNU/Linux 256 GB RAM I am running