I do not understand your question. The frame is an area of memory used by the operator to deposit data for the next operator to consume. Once that data has been consumed, we reuse that frame for the next bucket of data to be delivered.


On 12/4/13, 4:34 PM, Eldon Carman wrote:
Whats the reason for Hyracks selecting a single frame for each operator?
Many of the rewrite rules focus on minimizing the data we store in this
single frame.


On Wed, Dec 4, 2013 at 1:34 PM, Eldon Carman <[email protected]> wrote:

I posted a question about tuple flow to the Hyracks group. Here is a copy
of the dialogue.


On Tue, Dec 3, 2013 at 11:48 PM, Vinayak Borkar wrote:

The standard strategy used by every operator to send data to the next
operator in the pipeline is to use one pre-allocated memory buffer that is
reused.

Say OP0 feeds data to OP1 (unnest) feeds data to OP2. OP0 and OP1 create
a "frame" each, say F0 and F1 respectively, at the beginning of query
execution.

OP0 would then pack as many tuples (whose format is the sequential
juxtaposition of its field values) into F0 until no more tuples can fit. At
this time OP0 invokes the nextFrame() method on OP1 (through connectors, if
applicable) to pass the data to OP1. OP1 iterates over F0 and processes
each tuple creating the result tuples in F1. One of two things can happen
now; either F0 is exhausted and F1 still has room, or F1 is full and F0
still contains tuples to be processed. In the first case, OP1 would return
from the next frame call back to OP0 which would refill F0 with the next
set of tuples. In the second case, OP1 would invoke OP2.nextFrame(F1).

In your specific example, the unnest operator would end up copying $$1
three times, once for each output tuple. However, in terms of memory
consumption, this does not lead to more space usage when the operators
pipeline the frames as described above. It is however inefficient to make
the copies.

Two broad strategies are possible to improve the performance of the
system.

1. In VXQuery, we use a sequence of unnest operators, one for each path
expression. So a/b/c will become three unnest operators. This is not
necessary. VXQuery could have a rewrite rule that converts

unnest iterate($$1, "b") -> $$2
   unnest iterate($$0, "a") -> $$1

into unnest iterate($$0, "a/b") -> $$2 when $$1 is not required.

This concept can be further used to rewrite

unnest iterate(...)
   data-scan(...)

into data-scan with the path that is needed pushed into the source when
the binding of the data-scan itself is not needed for anything other than
the unnest. We could extend the parser to only produce XML trees for the
given path steps. This will eliminate a whole bunch of copies.


2. The second strategy is at the Algebricks/Hyracks level. Every operator
could accept a "projection" list (a list of fields that are not needed
upstream). So the unnest could then not copy the input field into the
output when its not needed anymore. This will also help with fixing the
extra copying.

In VXQuery, (1) will show a huge improvement in terms of performance.


Vinayak



On 12/3/13, 12:18 PM, prestonc wrote:

How does the tuple information flow between operators? I want to
understand better the dynamics of adding or removing fields from the
tuple stream. As I understand it, the operator adds tuples are added to
a frame until it is full and then is passed on to the next operator.
Does the next operator start working on that frame as soon as it gets
the frame?

The frame passed on to the next operator. In the situation, where more
information is added to the tuple. Does the operator start a new frame
and put the new tuple with the additional information in this new frame?
Possibly creating many frames from a single frame input? What happens to
the old frame of data?

Consider an UNNEST operator. The operator reads a sequence field ($$1)
and creates individual items in a new field ($$2).
{{$$1-->(1, 2, 3)}} becomes  {{$$1-->(1, 2, 3), $$2-->1}}  {{$$1-->(1,
2, 3), $$2-->2}}  {{$$1-->(1, 2, 3), $$2-->3}}
Does this mean that $$1 is now copied throughout each tuple and has just
tripled the amount of space its taking up?




Reply via email to