Bikas
>
>
>
> From: Raajay [mailto:raaja...@gmail.com <mailto:raaja...@gmail.com>]
> Sent: Tuesday, December 8, 2015 5:50 PM
> To: user@tez.apache.org <mailto:user@tez.apache.org>
> Subject: Re: Writing intermediate data
>
>
>
> Thanks for the valuabl
and retargetable to different physical targets – e.g.
>> in-memory HDFS, or Tachyon or NFS or S3 etc.
>>
>>
>>
>> Thoughts?
>>
>> Bikas
>>
>>
>>
>> *From:* Raajay [mailto:raaja...@gmail.com]
>> *Sent:* Tuesday, December 8, 2015 5:
al targets – e.g.
> in-memory HDFS, or Tachyon or NFS or S3 etc.
>
>
>
> Thoughts?
>
> Bikas
>
>
>
> *From:* Raajay [mailto:raaja...@gmail.com]
> *Sent:* Tuesday, December 8, 2015 5:50 PM
> *To:* user@tez.apache.org
> *Subject:* Re: Writing intermediate data
&g
t of these Inputs/Outputs
> becomes reusable and retargetable to different physical targets – e.g.
> in-memory HDFS, or Tachyon or NFS or S3 etc.
>
>
>
> Thoughts?
>
> Bikas
>
>
>
> *From:* Raajay [mailto:raaja...@gmail.com]
> *Sent:* Tuesday, December 8,
: user@tez.apache.org
Subject: Re: Writing intermediate data
Thanks for the valuable inputs.
A quick clarification :
" - Tez uses DataMovementEvents to inform the downstream vertex on where to
pull data from. This information handshake is part of the Input/Output pair
implement
To clarify, by information handshake, I meant how to tell the downstream vertex
tasks where the generating task wrote data to and also when to start reading
data. If this can be somehow be pre-defined at the plan build time, sure, you
probably don’t need a lot of info to be sent downstream as it
Thanks for the valuable inputs.
A quick clarification :
" - Tez uses DataMovementEvents to inform the downstream vertex on where to
pull data from. This information handshake is part of the Input/Output pair
implementation."
If the edges had type PERSISTED_RELIABLE, the information handshake is
The other way to look at this problem is that for a given edge between 2
vertices, the data format and transfer mechanism is governed by the Output of
the upstream vertex and the Input of the downstream vertex. You can potentially
write your own Input and Output pair that work with HDFS or tachy
Using hdfs (or a filesystem other than local) is not supported yet. tmpfs
would be your best bet in that case - we have tested with this before, but
this has capacity limitations, and mixing tmpfs with regular disks does not
provide a deterministic mechanism of selecting memory as the intermediate
I wish to setup a Tez data analysis framework, where the data resides in
memory. Currently, I have tez (and also Hive) setup such that it can read
from an in-memory filesystem like Tachyon.
However, the intermediate data is still written to disk at the each
processing node. I considered writing to
10 matches
Mail list logo