Re: DAGScheduler

2016-01-14 Thread Raajay
release tasks to be scheduled when natural order and time constraints are satisfied. The code is at ( https://github.com/raajay/tez/blob/crossquery/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGSchedulerCrossQuery.java) if you are interested. Basically, upon creating an instance of a DAG

DAGScheduler

2016-01-13 Thread Raajay
e priority levels in "DAGSchedulerNaturalOrderControlled" based on T, to delay the start of vertices ? Which of the three is easiest to implement and possibly have limited side-effects ? Any help/pointers is appreciated. Thanks Raajay

Re: Writing intermediate data

2015-12-10 Thread Raajay
Thanks a lot, Sid! Raajay > On Dec 10, 2015, at 10:00 PM, Siddharth Seth wrote: > > Raajay, I was able to locate a temporary patch which provided this > functionality for broadcast. I don't think the patch will apply to the > current code base, but should provide an idea

Re: Writing intermediate data

2015-12-10 Thread Raajay
I am looking to hack something up quick to see if there is any performance improvement by using in-memory lookup for intermediate data. @Siddarth: I am not well versed with Tez code base. Which packages (source class) should I be looking at to implement the hack you suggested ? Thanks, Raajay

Re: Writing intermediate data

2015-12-08 Thread Raajay
ndshake is probably not needed. Is that right ? - Raajay On Tue, Dec 8, 2015 at 6:17 PM, Hitesh Shah wrote: > The other way to look at this problem is that for a given edge between 2 > vertices, the data format and transfer mechanism is governed by the Output > of the upstream vertex a

Writing intermediate data

2015-12-07 Thread Raajay
to tmpfs, however, such a setup does not fall back to disk gracefully. Does Tez have an interface to write intermediate data to HDFS like filesystem ? If yes, what are the settings ? Does setting "yarn.nodemanager.local-dirs" to some HDFS or Tachyon URI suffice ? Thanks, Raajay

Re: Shared object registry

2015-12-02 Thread Raajay
I did not write my own processor. I just re-use Tez Work created by Hive. So the processors are classes like HiveMap, HiveJoin defined by Hive. So if I understand the setting correctly, only by modifying these processors can I take advantage of Shared Object Registry. Thanks a lot ! Raajay On

Re: Shared object registry

2015-12-01 Thread Raajay
then if I want to dump intermediate data (say output of mappers for small jobs) into the shared object registry how shall I do that ? Raajay On Tue, Dec 1, 2015 at 12:47 PM, Bikas Saha wrote: > Object registry is a user enabled feature provided by Tez to the > application > (e.g. H

Shared object registry

2015-12-01 Thread Raajay
configs i need to ensure are set correctly to use shares objects? - Raajay

Re: Running tez jobs with data in memory

2015-11-30 Thread Raajay
now :( Raajay > On Nov 30, 2015, at 6:34 PM, Rajesh Balamohan > wrote: > > Adding more to #2. Alternatively, you may want to consider adding paths to > HDFS in-memory tier > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/MemoryStorage

Running tez jobs with data in memory

2015-11-30 Thread Raajay
mappers spend in reading from HDFS or disk. Thanks Raajay

Setting vertex parallelism

2015-09-12 Thread Raajay
The Vertex.java api does not allow parallelism to the changed after a vertex is created; there is no setParallelism() api exposed. Any specific reason ? Will changing the parallelism affect the execution ? Thanks Raajay

Over writing files

2015-09-11 Thread Raajay
(have same id) and that prevents overwrite. Where should i introduce randomness in the file name ? Should I change some name field in FileSinkDescriptor every time I re-run the dag ? Thanks Raajay Vertex failed, vertexName=Reducer 3, vertexId=vertex_1441949856963_0011_1_04

Re: Missing libraries.

2015-09-11 Thread Raajay
t;jar" files need to be provided ? Thanks Raajay On Fri, Sep 11, 2015 at 4:38 AM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com> wrote: > > You may try the following steps to check the jars your tez job is using > >- Set "yarn.nodemanager.delete.debug-delay-sec” to

Re: Missing libraries.

2015-09-11 Thread Raajay
Yeah. I added the hive-exec.jar that contains HiveSpltGenerator to HDFS. I still hit the exception On Fri, Sep 11, 2015 at 2:43 AM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com> wrote: > > Have you try using jar rather than tar.gz ? > > > Best Regard, > Jeff Zhang >

Missing libraries.

2015-09-11 Thread Raajay
I am running DAGs generated by Hive for Tez in offline mode; as in I store the DAGs to disk and then run them later using my own Tez Client. I have been able to get this setup going in local mode. However, while running on the cluster, I hit Processor class not found exception (snippet below). I f

Re: Error of setting vertex location hints

2015-09-11 Thread Raajay
"Not sure your purpose. Usually data locality can improve performance" - Mostly study purpose :) Thanks for the pointers to the file to be change. Helps very much ! Raajay On Thu, Sep 10, 2015 at 3:15 AM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com> wrote: > >>

Re: Creating TaskLocationHints

2015-09-11 Thread Raajay
I was able to get it working with "hostnames". thanks! To dig deeper, how much does Tez obey the hints provided? How are Vertex Location Hints handled ? What if YARN is not able to provide containers in requested locations ? Raajay On Thu, Sep 10, 2015 at 10:19 AM, Hitesh Shah wr

Creating TaskLocationHints

2015-09-10 Thread Raajay
While creating TaskLocationHints, using the static function TaskLocationHint.createTaskLocationHint(Set nodes, Set racks) what should the Strings be ? IP address of the nodes ? Node labels ? Or hostnames ? Thanks Raajay

Re: Error of setting vertex location hints

2015-09-09 Thread Raajay
scheduled at a location different than the location of its input block ? If yes, how ? Raajay On Thu, Sep 10, 2015 at 12:30 AM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com> wrote: > >>> In the WordCount example, while creating the Tokenizer Vertex, neither > the parallel

Re: Error of setting vertex location hints

2015-09-09 Thread Raajay
but can be arbitrarily configured while creation ? Raajay On Thu, Sep 10, 2015 at 12:01 AM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com> wrote: > > Actually Tokenizer vertex should already have the VertexLocationHints from > the hdfs file split info at runtime. Did you see

Error of setting vertex location hints

2015-09-09 Thread Raajay
e -1, so that it can compute it. What minimal modification to the example would avoid invoking the VertexManager and allow me use my own customized VertexLocationHint ? Thanks Raajay DAG diagnostics: [Vertex failed, vertexName=Tokenizer, vertexId=vertex_1441839249749_0017_1_00,

Re: How to use tez-site.xml

2015-09-09 Thread Raajay
Great. Thanks ! Raajay On Wed, Sep 9, 2015 at 8:46 PM, Bikas Saha wrote: > For your own custom jars you should use “tez.aux.uris” instead of > “tez.lib.uris” > > > > These configs only need to be present on the client node that is used to > submit the DAG. tez-site.xml i

How to use tez-site.xml

2015-09-09 Thread Raajay
t;tez-site.xml" in worker nodes also be update to include the new "jar" dependencies ? Thanks Raajay

DAG serialization

2015-09-01 Thread Raajay
submit to Tez ? It does not seem like DAG.java in tez-api can be serialized. Raajay

Re: Add tez dependency

2015-09-01 Thread Raajay
That worked! Thanks! - Raajay On Tue, Sep 1, 2015 at 2:51 PM, Hitesh Shah wrote: > You need to add tez-api as a dependency i.e set artifactId to tez-api. > > Also, you are better off using a released version as a dependency e.g > 0.7.0 instead of a SNAPSHOT which will chan

Add tez dependency

2015-09-01 Thread Raajay
ot; on the client gives errors. "Could not find artifact org.apache.tez:tez:jar:0.8.0-SNAPSHOT -> [Help 1]" I am not able to install tez libraries to the local maven repo. Thanks Raajay

Re: Tez client

2015-08-27 Thread Raajay
bmitDAG >- use DAGClient from the submitDAG response to monitor the DAG progress > > — Hitesh > > On Aug 27, 2015, at 1:52 PM, Raajay wrote: > > > Hi, > > > > How should I go about writing a Tez client? Essentially, I have DAG's > de-serialized from Hive,

Tez client

2015-08-27 Thread Raajay
Hi, How should I go about writing a Tez client? Essentially, I have DAG's de-serialized from Hive, and make modifications to it. Now I want to use information from the DAG and submit a tez job. Any sample code for a tez client will help me get started quickly. Thanks Raajay

Re: Logical to Physical DAG conversion

2015-08-19 Thread Raajay Viswanathan
Thanks Jeff and Hitesh. Wonderful pointers / summary to get me started. Raajay > On Aug 19, 2015, at 4:20 PM, Hitesh Shah wrote: > > If you are mainly looking at this from a “mapper” and “reducer” perspective, > there are 2 main ways in which Tez affects parallelism: > >

Logical to Physical DAG conversion

2015-08-19 Thread Raajay Viswanathan
are some rules of thumb regarding the number of mappers/reducers for a stage ? I would ideally like to fix the #mappers / #reducers in the application itself rather than let Tez determine it. What are the common pitfalls in doing so ? Thanks, Raajay