Hi, I am trying to compute some final statistics over a stream topology. For this I would like to gather all data from all windows and parallel partitions into a single/global window. Could you suggest a solution for this. I saw that the map function has a ".global()" but I end up with the same number of partitions as I have in the main computation. Bellow you can find a schema for the program:
DataStream stream = env.Read... end.setParallelism(10); //Compute phase DataStream<Tuple...> result = stream.keyBy(_).window(_).apply(); //end compute phase //get the metrics result.map(//extract some of the Tuple fields).global().timeWindowAll(Time.of(5, TimeUnit.SECONDS),Time.of(1, TimeUnit.SECONDS)) .trigger(EventTimeTrigger.create()).apply ().writeAsText(); For this last function - I would expect that even if I had parallel computation during the compute phase, I can select part of the events from all partitions and gather all these into one unique window. However, I do not seem to be successful in this. I also tried by applying a keyBy() to the result stream in which I assigned the same hash to any event, but the result remains the same. result.map((//extract some of the Tuple fields).keyBy( new KeySelector<Tuple2<Long,Long>, Integer>() { @Override public Integer getKey(Tuple2<Long, Long> arg0) throws Exception { return 1; } @Override public int hashCode() { return 1; } }). timeWindowAll().apply() Thanks for the help/ideas