On 12/4/13, 4:23 PM, Eldon Carman wrote:
In previous e-mails we have suggested a few new rewrite rules and I want to
get feedback on them.
---------
The first rewrite rule would merge two unnest child operations into a
single unnest operator.
UNNEST( $v2 : child($v1, "step2") )
UNNEST($v1 : child($v0, "step1") )
where $v1 is not used in the rest of the plan.
Options for solution:
1. UNNEST( $v2 : child(child($v0, "step1"), "step2") )
or
2. UNNEST( $v2 : child($v0, "step1/step2") )
First, is this optimization best placed in the rewrite rules space?
(consider the compiler, etc.)
Second, which of the solutions should we consider for implementation? Or do
you know something else?
The rewriter is the best place to perform this transformation IMO.
During translation you may not be able to "realize" this optimization in
every case where it applies. You have a much better chance of benefiting
from this rule in the rewriter.
In Algebricks, the UNNEST operator expects an Unnesting Function. On the
other hand, the input to an Unnesting function is a Scalar Function.
Unnesting functions implement an iterator API for the UNNEST to consume
every item without the need to first materialize the whole sequence. In
(1) the outer child will be invoked as an iterator, but the inner child
will be invoked as a scalar function leading it to materialize all step1
items.
On the other hand, (2) allows the child function to internally construct
nested iterators that can concurrently iterate over step1 and for each
step1 item, iterate over all step2 items.
---------
The second rewrite rule would merge unnest child into a data scan operation.
UNNEST( $v1 : child($v0, "step1") )
DATASCAN( collection( $source ), $v0 )
where $v0 is not used in the rest of the plan.
Options for solution:
DATASCAN( child(collection( $source ), "step1"), $v1 )
where the OperatorDescriptor for DATASCAN would understand the child of
collection.
DataScan accepts a Source object as its argument. So you cannot pass in
a function object to it (At least that's how it appears looking at your
string above). You will need to hold on to some representation of the
path being pushed into the scan, in the DataSource object implemented in
VXQuery. When it come time to create the runtime, you can have that
passed to the getScannerRuntime(...) call in IMetadataProvider as the
implObject argument.
The rewrite rule will in effect "push" the path needed into the VXQuery
source object.
-----------
The third rule searches for subplans that consume a single item input.
SUBPLAN {
AGGREGATE($v2 : sequence(%expression($v1)))
UNNEST($v1 : iterate($v0))
NESTED_TUPLE_SOURCE
}
if $v0 is a single item, not a sequence. Then rewrite to:
ASSIGN($v2 : %expression($v0))
First, does this rule look correct?
Second, is it worth putting this rule in place?
How are you going to determine that $v0 is a single item? Which cases
will that help with?