Re: S3 recovery and checkpoint directories exhibit explosive growth

2017-07-23 Thread prashantnayak
Hi Xiaogang and Stephan We're continuing to test and have now set up the cluster to disable incremental RocksDB checkpointing as well as increasing the checkpoint interval from 30s to 120s (not ideal really :-( ) We'll run it with a large number of jobs and report back if this setup shows

Count Different Codes in a Window

2017-07-23 Thread Raj Kumar
Hi, we have a requirement where we need to aggregate the data every 10mins and write ONCE the aggregated results to the elastic search. Right now, we are iterating over the iterable to make a count of different status codes to do this. Is there a better way to count different status codes.

Re: problems starting the training exercise TaxiRideCleansing on local cluster

2017-07-23 Thread Günter Hipler
Hi Nico, thanks for looking into it. The reason for the behavior on my system: I had two different jdk versions installed (openjdk and oracle jdk) - I wasn't aware of because I prefer to use generally the oracle jdk. Somehow, I didn't analyze at greater depth, both versions were used in

Find the running median from a data stream

2017-07-23 Thread Gabriele Di Bernardo
Hi guys, I want to keep track of the running median of a keyed data stream. I was considering to apply a RichMapFunction to the stream and store in a ValueState object two heaps (PriorityQueue) in order to find the running median. However, I am not really sure if this is the best approach

Re: notNext() and next(negation) not yielding same output in Flink CEP

2017-07-23 Thread Dawid Wysakowicz
Hi Yassine, First of all notNext(A) is not equal to next(not A). notNext should be considered as a “stopCondition” which tells if an event matching the A condition occurs the current partial match is discarded. The next(not A) on the other hand accepts every event that do not match the A

Re: Gelly PageRank implementations in 1.2 to 1.3

2017-07-23 Thread Kaepke, Marc
Hi Greg, I do an evaluation between Gelly and GraphX (Spark). Both frameworks implement PageRank and Gelly provides a lot of variants (*thumbs up*). During a really small initial test I get for the vertex-centric, scatter-gather and gsa version the same ranking result. Just the implementation