Hi David, All the implementations in the tez runtime library which exist today are key-value based, primarily because we haven't required a byte only / record based readers.
That said, Tez co-ordination itself does not impose any restrictions on the data being moved, and building out a byte move making use of the current runtime library components should not be too complicated. Some classes that you could look at for reference are ShuffleManager, UnorderedKVInput and UnorderedKVReader. ShuffleManager just takes care of moving bytes over the network from different source tasks. These bytes are then interpreted as IFiles and Keys/Values by the Inputs / Readers. A byte mover based on these would effectively use the ShuffleManager to fetch the data - and then expose a Reader on top which would just expose the individual byte streams. Couple more things to note - this is not a streaming implementation. The fetch takes place only when the source output is complete. There's some work being done to pipeline the data transfer - so that smaller chunks are transferred and usable. HTH Thanks - Sid On Mon, Jan 12, 2015 at 12:39 PM, David Pollak < [email protected]> wrote: > Howdy, > > I'm starting to learn about Tez. I've watched some of the Tez > presentations. There are some claims that it's possible to have a "byte > mover" edge... one that will just move bytes between Vertices. I've looked > through the code, but am unable to find one with the basic Tez > distribution. I would appreciate it if someone could point me to a > byte-mover Edge. > > Thanks, > > David > > -- > Brick Alloy http://brickalloy.com <https://telegr.am> > Lift, the simply functional web framework http://liftweb.net > Follow me: http://twitter.com/dpp > Blog: http://goodstuff.im > >
