Re: Writing groups of Windows to files

2017-06-30 Thread Fabian Hueske
Hi Niels, If I understand your use case correctly, you'd like to hold back all events of a session until it ends/timesout and then write all events out. So, instead of aggregating per session (the common use case), you'd just like to collect the event. I would implement a simple WindowFunction th

Re: Writing groups of Windows to files

2017-06-30 Thread Nico Kruber
Looks like you are missing a window *function* that processes the window. >From [1] : stream .keyBy(...) <- keyed versus non-keyed windows .window(...) <- required: "assigner" [.trigger(...)] <- optional: "trigger" (else default trigger) [.evicto

Re: a lot of connections in state "CLOSE_WAIT"

2017-06-30 Thread Nico Kruber
Hi XiangWei, this could be a resource leak, i.e. a socket not getting closed, but I was unable to reproduce that behaviour. Maybe Chesnay (cc'd) has an idea on how/ where this may happen. Can you tell us a bit more on what you where doing / how the webinterface was used? Is there a way to reprod

Re: Checkpointing with RocksDB as statebackend

2017-06-30 Thread Aljoscha Krettek
That’s great to hear! Maybe it would make sense to add these defaults to Flink, if they don’t otherwise degrade performance. Best, Aljoscha > On 29. Jun 2017, at 22:44, Vinay Patil wrote: > > Hi Guys, > > I am able to overcome the physical memory consumption issue by setting the > following

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-30 Thread Aljoscha Krettek
Yes, in the end the requests to HBase are the bottle neck and the latency will manifest in different places of the job depending on where there is a queue. If there is a queue between map and flatMap elements will sit there and wait and you’ll see latency there. If map and flatMap are chained yo

Writing groups of Windows to files

2017-06-30 Thread Niels Basjes
Hi, I have the following situation: - I have click stream data from a website that contains a session id. - I then use something like EventTimeSessionWindows to group these events per session into a WindowedStream So I essentially end up with a stream of "finished sessions" So far I am able to d

Re: Is there some metric info about RocksdbBackend?

2017-06-30 Thread gerryzhou
Hi Vinay, Thanks for your response, i will try for your advice。 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Is-there-some-metric-info-about-RocksdbBackend-tp14081p14084.html Sent from the Apache Flink User Mailing List archive. maili

Re: Is there some metric info about RocksdbBackend?

2017-06-30 Thread vinay patil
Hi Gerry, I had set the following parameters for DBOptions: setStatsDempPeriodSec( 600) // 10 mins setDBLogDir("/var/tmp/") I saw the files getting generated in /var/tmp. But I did not get the statistics here. You will get all the options that are used for RocksDB. May be you can try to set cre

回复:An addition to Netty's memory footprint

2017-06-30 Thread Zhijiang(wangzhijiang999)
Based on Kurt's scenario, if the cumulator allocates a big ByteBuf from ByteBufAllocator during expansion, it is easy to result in creating a new PoolChunk(16M) because of no consistent memory in current PoolChunks. And this will cause the total used direct memory beyond estimated. For further

Is there some metric info about RocksdbBackend?

2017-06-30 Thread 周思华
Hi, Is there some metric info about RocksdbBackend in flink, like sst compact times, memtable dump times, block cache size and so on. Currently when using Rocksdb as backend it behavior is black for us and it consumption a lot of memory, i want to figure out it behavior via metric.

An addition to Netty's memory footprint

2017-06-30 Thread Kurt Young
Hi, Ufuk had write up an excellent document about Netty's memory allocation [1] inside Flink, and i want to add one more note after running some large scale jobs. The only inaccurate thing about [1] is how much memory will LengthFieldBasedFrameDecoder use. From our observations, it will cost at m