DataSourceType isn't really used at the moment. Eventually, it would serve
more as a scheduling and failure recovery mechanism more than deciding how
data gets persisted between stages. (This property could potentially be
used by some of the Inputs/Outputs to alter the way they persist data - but
that isn't currently on the cards).
This primarily applies to data written on Edges - are you somehow looking
to modify that, or use the data generated by an intermediate Vertex in a
separate process ?
Getting a little more info on the use case would be helpful in figuring out
how Tez can be used. Are you looking to read data from this internal
service, publish to it, or something else ?


On Fri, Jul 25, 2014 at 11:36 AM, David Capwell <[email protected]> wrote:

> Sorry, copy/paste issue.  I was looking at DataSourceType and trying to
> see how data gets saved and read between tasks.  The use-case is that we
> have an internal service that might be helpful for us, so wanted to
> prototype how possible it would be to share data over different mechanism.
>
>
> On Fri, Jul 25, 2014 at 10:36 AM, Hitesh Shah <[email protected]> wrote:
>
>> DataMovementEvent is a construct defined for an Input/Output pair to
>> communicate with each other. The actual information being passed between
>> the 2 is not understood by the framework except in that, it is a byte
>> payload to be handed off from the source to the destination. Users are not
>> expected to create derived classes of this type but to use the payload
>> within the object to pass information around.
>>
>> For example, most of the currently implemented Input-Output pairs ( for
>> shuffle/broadcast edges ) use the payload to pass the url specifying the
>> location of the data to be fetched.
>>
>> thanks
>> — HItesh
>>
>> On Jul 25, 2014, at 10:23 AM, David Capwell <[email protected]> wrote:
>>
>> > So going through the code and not sure where the real logic of
>> DataMovementType gets used.
>> >
>> > I see that in DagTypeConverts it can convert between DataMovementType
>> and PlanEdgeDataMovementType, but once that happens I don't really see a
>> way to implement any of these types.  Where is the implementations defined?
>> Is there any way to define my own impls?
>> >
>> > Thanks for your time.
>>
>>
>

Reply via email to