Hi to all,
we wanted to do a group by on elements that can contains null values and we
discovered that Table API support this while Dataset API does not.
Is this documented somehwere on the Flink site?

Best,
Flavio

-------------------------------------------------------

PS: you can test this with the following main:

public static void main(String[] args) throws Exception {
    final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
    final BatchTableEnvironment btEnv =
TableEnvironment.getTableEnvironment(env);
    final DataSet<String> testDs = env
        .fromElements("test", "test", "test2", "null", "null", "test3")
        .map(x -> "null".equals(x) ? null : x);

    boolean testDatasetApi = true;
    if (testDatasetApi) {
      testDs.groupBy(x -> x).reduceGroup(new GroupReduceFunction<String,
Integer>() {

        @Override
        public void reduce(Iterable<String> values, Collector<Integer> out)
throws Exception {
          int cnt = 0;
          for (String value : values) {
            cnt++;
          }
          out.collect(cnt);
        }
      }).print();
    }

    btEnv.registerDataSet("TEST", testDs, "field1");
    Table res = btEnv.sqlQuery("SELECT field1, count(*) as cnt FROM TEST
GROUP BY field1");
    DataSet<Row> result = btEnv.toDataSet(res,
        new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
BasicTypeInfo.LONG_TYPE_INFO));
    result.print();
  }

Reply via email to