Re: load balancing groups

2014-10-31 Thread Fabian Hueske
Hmm, just found that there is no JoinHint that would allow what I described above. Broadcasting one input and using the other one to build a hash-tables is usually not a good thing to do, because the broadcasted side should be much smaller than the other one... 2014-10-31 21:56 GMT+01:00 Fabian H

Re: Hi / Aggregation support

2014-10-31 Thread Fabian Hueske
Hi Viktor, welcome on the dev mailing list! :-) I agree that Flink's aggregations should be improved in various aspects: - support more aggregation functions. Currently only MIN, MAX, SUM are supported. Adding COUNT and AVG would be nice! - support for multiple aggregations per field - support fo

Re: load balancing groups

2014-10-31 Thread Fabian Hueske
Just had another idea. The group-wise crossing that you are doing is actually a self-join on the grouping key. The system has currently no special strategy to deal with selfjoins. That means both inputs of the join (which are identical) are treated as two individual inputs. If you force a broadcast

Hi / Aggregation support

2014-10-31 Thread Rosenfeld, Viktor
Hi everybody, First, I want to introduce myself to the community. I am a PhD student who wants to work with and improve Flink. Second, I thought to work on improving aggregations as a start. My first goal is to simplify the computaton of a field average. Basically, I want to turn this plan:

[jira] [Created] (FLINK-1200) Add count() aggregate function to Java and Scala APIs

2014-10-31 Thread Kostas Tzoumas (JIRA)
Kostas Tzoumas created FLINK-1200: - Summary: Add count() aggregate function to Java and Scala APIs Key: FLINK-1200 URL: https://issues.apache.org/jira/browse/FLINK-1200 Project: Flink Issue T