Vladimir Sitnikov created CALCITE-4480:
------------------------------------------

             Summary: Make EnumerableDefaults#union a non-blocking operation
                 Key: CALCITE-4480
                 URL: https://issues.apache.org/jira/browse/CALCITE-4480
             Project: Calcite
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.26.0
            Reporter: Vladimir Sitnikov


Currently, EnumerableDefaults#union buffers all the rows before it returns the 
first of them

Pros:
1) Faster iteration in case enumerable is queried multiple times

Cons:
1) The implementation does not work with infinite streams
2) Keeps memory even after iteration is finished

---

An alternative might be something like

{code:java}
  public static <TSource> Enumerable<TSource> union(Enumerable<TSource> source0,
      Enumerable<TSource> source1) {
    Enumerable<TSource> unionAll = concat(source0, source1);
    return new AbstractEnumerable<TSource>() {
      @Override public Enumerator<TSource> enumerator() {
        Set<TSource> set = new HashSet<>();
        return EnumerableDefaults.where(unionAll, set::add).enumerator();
      }
    };
  }
{code}

Pros:
1) Supports infinite streams
2) In theory, it could reset hashSet after iteration finishes

Cons:
1) Slower iteration in case enumerable is queried multiple times (hashSet is 
rebuilt every time)
2) concat+abstractenumerable might const CPU cycles





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to