Re: Writing intermediate data

2015-12-10 Thread Raajay
Bikas > > > > From: Raajay [mailto:raaja...@gmail.com <mailto:raaja...@gmail.com>] > Sent: Tuesday, December 8, 2015 5:50 PM > To: user@tez.apache.org <mailto:user@tez.apache.org> > Subject: Re: Writing intermediate data > > > > Thanks for the valuabl

Re: Writing intermediate data

2015-12-10 Thread Siddharth Seth
and retargetable to different physical targets – e.g. >> in-memory HDFS, or Tachyon or NFS or S3 etc. >> >> >> >> Thoughts? >> >> Bikas >> >> >> >> *From:* Raajay [mailto:raaja...@gmail.com] >> *Sent:* Tuesday, December 8, 2015 5:

Re: Writing intermediate data

2015-12-10 Thread Raajay
al targets – e.g. > in-memory HDFS, or Tachyon or NFS or S3 etc. > > > > Thoughts? > > Bikas > > > > *From:* Raajay [mailto:raaja...@gmail.com] > *Sent:* Tuesday, December 8, 2015 5:50 PM > *To:* user@tez.apache.org > *Subject:* Re: Writing intermediate data &g

Re: Writing intermediate data

2015-12-10 Thread Siddharth Seth
t of these Inputs/Outputs > becomes reusable and retargetable to different physical targets – e.g. > in-memory HDFS, or Tachyon or NFS or S3 etc. > > > > Thoughts? > > Bikas > > > > *From:* Raajay [mailto:raaja...@gmail.com] > *Sent:* Tuesday, December 8,

RE: Writing intermediate data

2015-12-09 Thread Bikas Saha
: user@tez.apache.org Subject: Re: Writing intermediate data Thanks for the valuable inputs. A quick clarification : " - Tez uses DataMovementEvents to inform the downstream vertex on where to pull data from. This information handshake is part of the Input/Output pair implement

Re: Writing intermediate data

2015-12-08 Thread Hitesh Shah
To clarify, by information handshake, I meant how to tell the downstream vertex tasks where the generating task wrote data to and also when to start reading data. If this can be somehow be pre-defined at the plan build time, sure, you probably don’t need a lot of info to be sent downstream as it

Re: Writing intermediate data

2015-12-08 Thread Raajay
Thanks for the valuable inputs. A quick clarification : " - Tez uses DataMovementEvents to inform the downstream vertex on where to pull data from. This information handshake is part of the Input/Output pair implementation." If the edges had type PERSISTED_RELIABLE, the information handshake is

Re: Writing intermediate data

2015-12-08 Thread Hitesh Shah
The other way to look at this problem is that for a given edge between 2 vertices, the data format and transfer mechanism is governed by the Output of the upstream vertex and the Input of the downstream vertex. You can potentially write your own Input and Output pair that work with HDFS or tachy

Re: Writing intermediate data

2015-12-08 Thread Siddharth Seth
Using hdfs (or a filesystem other than local) is not supported yet. tmpfs would be your best bet in that case - we have tested with this before, but this has capacity limitations, and mixing tmpfs with regular disks does not provide a deterministic mechanism of selecting memory as the intermediate

Writing intermediate data

2015-12-07 Thread Raajay
I wish to setup a Tez data analysis framework, where the data resides in memory. Currently, I have tez (and also Hive) setup such that it can read from an in-memory filesystem like Tachyon. However, the intermediate data is still written to disk at the each processing node. I considered writing to