On 12/4/13, 4:23 PM, Eldon Carman wrote:
In previous e-mails we have suggested a few new rewrite rules and I want to
get feedback on them.

---------
The first rewrite rule would merge two unnest child operations into a
single unnest operator.

UNNEST( $v2 : child($v1, "step2") )
   UNNEST($v1 : child($v0, "step1") )
where $v1 is not used in the rest of the plan.

Options for solution:

1. UNNEST( $v2 : child(child($v0, "step1"), "step2") )
or
2. UNNEST( $v2 : child($v0, "step1/step2") )

First, is this optimization best placed in the rewrite rules space?
(consider the compiler, etc.)
Second, which of the solutions should we consider for implementation? Or do
you know something else?

The rewriter is the best place to perform this transformation IMO. During translation you may not be able to "realize" this optimization in every case where it applies. You have a much better chance of benefiting from this rule in the rewriter.

In Algebricks, the UNNEST operator expects an Unnesting Function. On the other hand, the input to an Unnesting function is a Scalar Function. Unnesting functions implement an iterator API for the UNNEST to consume every item without the need to first materialize the whole sequence. In (1) the outer child will be invoked as an iterator, but the inner child will be invoked as a scalar function leading it to materialize all step1 items.

On the other hand, (2) allows the child function to internally construct nested iterators that can concurrently iterate over step1 and for each step1 item, iterate over all step2 items.




---------
The second rewrite rule would merge unnest child into a data scan operation.

UNNEST( $v1 : child($v0, "step1") )
   DATASCAN( collection( $source ), $v0 )
where $v0 is not used in the rest of the plan.

Options for solution:

DATASCAN( child(collection( $source ), "step1"), $v1 )

where the OperatorDescriptor for DATASCAN would understand the child of
collection.

DataScan accepts a Source object as its argument. So you cannot pass in a function object to it (At least that's how it appears looking at your string above). You will need to hold on to some representation of the path being pushed into the scan, in the DataSource object implemented in VXQuery. When it come time to create the runtime, you can have that passed to the getScannerRuntime(...) call in IMetadataProvider as the implObject argument.

The rewrite rule will in effect "push" the path needed into the VXQuery source object.



-----------
The third rule searches for subplans that consume a single item input.

SUBPLAN {
   AGGREGATE($v2 : sequence(%expression($v1)))
     UNNEST($v1 : iterate($v0))
       NESTED_TUPLE_SOURCE
}

if $v0 is a single item, not a sequence. Then rewrite to:

ASSIGN($v2 : %expression($v0))

First, does this rule look correct?
Second, is it worth putting this rule in place?

How are you going to determine that $v0 is a single item? Which cases will that help with?




Reply via email to