I did not write my own processor. I just re-use Tez Work created by Hive.
So the processors are classes like HiveMap, HiveJoin defined by Hive.

So if I understand the setting correctly, only by modifying these
processors can I take advantage of Shared Object Registry.

Thanks a lot !

Raajay

On Tue, Dec 1, 2015 at 3:39 PM, Bikas Saha <[email protected]> wrote:

> To be clear, you have written your own processor that runs in your DAG
> vertices? Your processor runs your custom code for processing input data.
>
> If yes, then the following applies.
>
> You will get access to the registry from your context object.
>
> You can use cacheForVertex() to cache for the lifetime of the vertex.
> cacheForDAG() to cache for the lifetime of the DAG and cacheForSession() to
> cache for the lifetime of a session (which runs multiple DAGs). As far as
> the key, value parameters – key is any unique string to look up the value.
> The value is any Java object (say a map or a list). For performance you
> would want to cache the object in a form that can be immediately used
> without any conversion.
>
>
>
> There is a toy example of the usage in the Tez source code in
> BroadcastAndOneToOneExample.java
>
>
>
> The Javadoc for object registry would have more details. Please open a
> jira if the Javadoc is not clear enough.
>
>
>
> *From:* Raajay [mailto:[email protected]]
> *Sent:* Tuesday, December 1, 2015 11:02 AM
> *To:* [email protected]
> *Subject:* Re: Shared object registry
>
>
>
> I am running a custom application; however, the dag is created similar to
> the dag that Hive would have created for the tpcds query. I use "TezClient"
> to submit these dags.
>
> How can I use Shared Objects explicitly ?
>
> I understand that Object Registry provides a key value interface. But then
> if I want to dump intermediate data (say output of mappers for small jobs)
> into the shared object registry how shall I do that ?
>
> Raajay
>
>
>
>
>
> On Tue, Dec 1, 2015 at 12:47 PM, Bikas Saha <[email protected]> wrote:
>
> Object registry is a user enabled feature provided by Tez to the
> application
> (e.g. Hive and Pig) If the application chooses to use this, then it can do
> some user land caching across tasks/vertices/dags using it. E.g. hive
> caches
> the smaller broadcast side of a broadcast join in the shared object
> registry.
>
> Object registry is not an automatic data caching or input caching
> mechanism.
>
> What application/job are you running? Hive/Pig/Custom? Unless the
> application (like Hive) has used object caching for a cross dag scenario
> (which AFAIK it does not) you will not see any difference. If its custom
> then you will have to explicitly use object registry in a manner that makes
> sense for your app.
>
>
>
> -----Original Message-----
> From: Raajay [mailto:[email protected]]
> Sent: Tuesday, December 1, 2015 10:36 AM
> To: [email protected]
> Subject: Shared object registry
>
> How to effectively use shared object registry?
>
> I created a tez client as a session, and submitted a dag twice
> sequentially.
>
>
> However, i did not see noticeable difference in their run times. They query
> was tpcds query#3.
>
> I had set enable container reuse in tez-site.xml. Are there other configs i
> need to ensure are set correctly to use shares objects?
>
> - Raajay
>
>
>

Reply via email to