release tasks to
be scheduled when natural order and time constraints are satisfied. The
code is at (
https://github.com/raajay/tez/blob/crossquery/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGSchedulerCrossQuery.java)
if you are interested.
Basically, upon creating an instance of a DAG
e priority levels in "DAGSchedulerNaturalOrderControlled" based
on T, to delay the start of vertices ?
Which of the three is easiest to implement and possibly have limited
side-effects ? Any help/pointers is appreciated.
Thanks
Raajay
Thanks a lot, Sid!
Raajay
> On Dec 10, 2015, at 10:00 PM, Siddharth Seth wrote:
>
> Raajay, I was able to locate a temporary patch which provided this
> functionality for broadcast. I don't think the patch will apply to the
> current code base, but should provide an idea
I am looking to hack something up quick to see if there is any performance
improvement by using in-memory lookup for intermediate data.
@Siddarth: I am not well versed with Tez code base. Which packages (source
class) should I be looking at to implement the hack you suggested ?
Thanks,
Raajay
ndshake is
probably not needed. Is that right ?
- Raajay
On Tue, Dec 8, 2015 at 6:17 PM, Hitesh Shah wrote:
> The other way to look at this problem is that for a given edge between 2
> vertices, the data format and transfer mechanism is governed by the Output
> of the upstream vertex a
to tmpfs, however, such a setup does
not fall back to disk gracefully.
Does Tez have an interface to write intermediate data to HDFS like
filesystem ? If yes, what are the settings ?
Does setting "yarn.nodemanager.local-dirs" to some HDFS or Tachyon URI
suffice ?
Thanks,
Raajay
I did not write my own processor. I just re-use Tez Work created by Hive.
So the processors are classes like HiveMap, HiveJoin defined by Hive.
So if I understand the setting correctly, only by modifying these
processors can I take advantage of Shared Object Registry.
Thanks a lot !
Raajay
On
then
if I want to dump intermediate data (say output of mappers for small jobs)
into the shared object registry how shall I do that ?
Raajay
On Tue, Dec 1, 2015 at 12:47 PM, Bikas Saha wrote:
> Object registry is a user enabled feature provided by Tez to the
> application
> (e.g. H
configs i
need to ensure are set correctly to use shares objects?
- Raajay
now :(
Raajay
> On Nov 30, 2015, at 6:34 PM, Rajesh Balamohan
> wrote:
>
> Adding more to #2. Alternatively, you may want to consider adding paths to
> HDFS in-memory tier
> (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/MemoryStorage
mappers spend in
reading from HDFS or disk.
Thanks
Raajay
The Vertex.java api does not allow parallelism to the changed after a
vertex is created; there is no setParallelism() api exposed.
Any specific reason ? Will changing the parallelism affect the execution ?
Thanks
Raajay
(have same
id) and that prevents overwrite.
Where should i introduce randomness in the file name ? Should I change some
name field in FileSinkDescriptor every time I re-run the dag ?
Thanks
Raajay
Vertex failed, vertexName=Reducer 3,
vertexId=vertex_1441949856963_0011_1_04
t;jar" files need
to be provided ?
Thanks
Raajay
On Fri, Sep 11, 2015 at 4:38 AM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:
>
> You may try the following steps to check the jars your tez job is using
>
>- Set "yarn.nodemanager.delete.debug-delay-sec” to
Yeah. I added the hive-exec.jar that contains HiveSpltGenerator to HDFS. I
still hit the exception
On Fri, Sep 11, 2015 at 2:43 AM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:
>
> Have you try using jar rather than tar.gz ?
>
>
> Best Regard,
> Jeff Zhang
>
I am running DAGs generated by Hive for Tez in offline mode; as in I store
the DAGs to disk and then run them later using my own Tez Client.
I have been able to get this setup going in local mode. However, while
running on the cluster, I hit Processor class not found exception (snippet
below). I f
"Not sure your purpose. Usually data locality can improve performance" -
Mostly study purpose :)
Thanks for the pointers to the file to be change. Helps very much !
Raajay
On Thu, Sep 10, 2015 at 3:15 AM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:
> >>
I was able to get it working with "hostnames". thanks!
To dig deeper, how much does Tez obey the hints provided? How are Vertex
Location Hints handled ? What if YARN is not able to provide containers in
requested locations ?
Raajay
On Thu, Sep 10, 2015 at 10:19 AM, Hitesh Shah wr
While creating TaskLocationHints, using the static function
TaskLocationHint.createTaskLocationHint(Set nodes, Set
racks)
what should the Strings be ? IP address of the nodes ? Node labels ? Or
hostnames ?
Thanks
Raajay
scheduled at a location different than the location of
its input block ? If yes, how ?
Raajay
On Thu, Sep 10, 2015 at 12:30 AM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:
> >>> In the WordCount example, while creating the Tokenizer Vertex, neither
> the parallel
but can be arbitrarily configured while
creation ?
Raajay
On Thu, Sep 10, 2015 at 12:01 AM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:
>
> Actually Tokenizer vertex should already have the VertexLocationHints from
> the hdfs file split info at runtime. Did you see
e -1,
so that it can compute it.
What minimal modification to the example would avoid invoking the
VertexManager and allow me use my own customized VertexLocationHint ?
Thanks
Raajay
DAG diagnostics: [Vertex failed, vertexName=Tokenizer,
vertexId=vertex_1441839249749_0017_1_00,
Great. Thanks !
Raajay
On Wed, Sep 9, 2015 at 8:46 PM, Bikas Saha wrote:
> For your own custom jars you should use “tez.aux.uris” instead of
> “tez.lib.uris”
>
>
>
> These configs only need to be present on the client node that is used to
> submit the DAG. tez-site.xml i
t;tez-site.xml" in
worker nodes also be update to include the new "jar" dependencies ?
Thanks
Raajay
submit to Tez ?
It does not seem like DAG.java in tez-api can be serialized.
Raajay
That worked! Thanks!
- Raajay
On Tue, Sep 1, 2015 at 2:51 PM, Hitesh Shah wrote:
> You need to add tez-api as a dependency i.e set artifactId to tez-api.
>
> Also, you are better off using a released version as a dependency e.g
> 0.7.0 instead of a SNAPSHOT which will chan
ot; on the client gives errors.
"Could not find artifact org.apache.tez:tez:jar:0.8.0-SNAPSHOT -> [Help 1]"
I am not able to install tez libraries to the local maven repo.
Thanks
Raajay
bmitDAG
>- use DAGClient from the submitDAG response to monitor the DAG progress
>
> — Hitesh
>
> On Aug 27, 2015, at 1:52 PM, Raajay wrote:
>
> > Hi,
> >
> > How should I go about writing a Tez client? Essentially, I have DAG's
> de-serialized from Hive,
Hi,
How should I go about writing a Tez client? Essentially, I have DAG's
de-serialized from Hive, and make modifications to it. Now I want to use
information from the DAG and submit a tez job.
Any sample code for a tez client will help me get started quickly.
Thanks
Raajay
Thanks Jeff and Hitesh. Wonderful pointers / summary to get me started.
Raajay
> On Aug 19, 2015, at 4:20 PM, Hitesh Shah wrote:
>
> If you are mainly looking at this from a “mapper” and “reducer” perspective,
> there are 2 main ways in which Tez affects parallelism:
>
>
are some rules of thumb regarding the number of mappers/reducers for
a stage ? I would ideally like to fix the #mappers / #reducers in the
application itself rather than let Tez determine it. What are the common
pitfalls in doing so ?
Thanks,
Raajay
31 matches
Mail list logo