Re: Installation and Configuration of Oozie 3.3

Alejandro Abdelnur Thu, 20 Dec 2012 01:40:06 -0800

Hi Mat,

Your 11min delay seems like Oozie is using its fail safe polling mechanism
to detect that a job completed in the cluster.


You may have to tweak the following VAR in the oozie-env.sh script:

# The base URL for callback URLs to Oozie
#
# export OOZIE_BASE_URL="http://
${OOZIE_HTTP_HOSTNAME}:${OOZIE_HTTP_PORT}/oozie"

Oozie sets callback URLs in the jobconfs using this base URL, then the JT
calls back oozie as soon as a job completes. If the hostname is not correct
(see the rest of the script to find out how the hostname is resolved), the
call from the JT will never arrive and Oozie will do a check every 10 mins
for each action.

Hope this solves your prob.

Regarding the wiki, yes, Oozie wiki,
https://cwiki.apache.org/confluence/display/OOZIE , which seems down at the
moment.

Thx



On Thu, Dec 20, 2012 at 9:25 AM, Matt Goeke <goeke.matt...@gmail.com> wrote:

> All,
>
> I have gotten over all of the blocker configuration hurdles between EC2 and
> EMR and I am able to submit one of my jobs with success. Unfortunately, I
> am running into a weird issue with my actions where each one will take
> exactly 11 mins from start to finish even though the delved MR job is no
> where near that long (1 action is exactly 34 seconds and the other is 5
> mins 21 seconds). I cannot guarantee that this is not an issue with overall
> resources on the node I am running the oozie instance on, I highly doubt it
> since this is on an m1.large spec, so I am curious if there are any changes
> to the site that might be able to flesh out what is causing this issue.
>
>
>
> Alejandro: The actual setup and configuration of this is fairly straight
> forward so I am happy to write up a wiki on this if you guys have a
> specific wiki in mind. I am not sure many people are keen on using EMR as a
> persistent cluster (I assume most persistant clusters are setup across EC2
> nodes) but I am actually very pleased with it so far since it greatly
> reduces the amount of initial setup required to spin up a cluster.
>
> --
> Matt
>
>
> On Tue, Dec 18, 2012 at 5:48 PM, Robert Kanter <rkan...@cloudera.com>
> wrote:
>
> > Hi Matt,
> >
> > The oozie.service.ProxyUserService.proxyuser.hadoop.hosts and
> > oozie.service.ProxyUserService.proxyuser.hadoop.groups
> > properties are part of Oozie's configuration and would go in
> > oozie-site.xml.  This lets you impersonate users on the Oozie side of
> > things.  See
> >
> >
> http://oozie.apache.org/docs/3.3.0/AG_Install.html#User_ProxyUser_Configurationfor
> > more info.
> >
> > There's two similar properties for Hadoop that go into core-site.xml:
> > hadoop.proxyuser.oozie.hosts
> > and hadoop.proxyuser.oozie.groups
> > I think this is what you need to fix your error.  See
> > http://hadoop.apache.org/docs/stable/Secure_Impersonation.html for more
> > info.
> >
> > - Robert
> >
> >
> >
> > On Tue, Dec 18, 2012 at 3:38 PM, Matt Goeke <goeke.matt...@gmail.com>
> > wrote:
> >
> > > All,
> > >
> > > Still working on getting Oozie 3.3 integrated with EMR with most of my
> > time
> > > so far spent resolving the security group config needed for VPC. The
> EC2
> > > configuration was pretty simple but the main blocker right now is
> getting
> > > past the error below:
> > >
> > > Caused by: org.apache.hadoop.ipc.RemoteException: User: hadoop is not
> > > allowed to impersonate hadoop
> > >         at org.apache.hadoop.ipc.Client.call(Client.java:1070)
> > >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> > >         at $Proxy24.getProtocolVersion(Unknown Source)
> > >         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
> > >         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
> > >         at
> > > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
> > >         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
> > >         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> > >         at
> > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
> > >         at
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> > >         at
> > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
> > >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
> > >         at
> > >
> > >
> >
> org.apache.oozie.service.HadoopAccessorService$2.run(HadoopAccessorService.java:411)
> > >         at
> > >
> > >
> >
> org.apache.oozie.service.HadoopAccessorService$2.run(HadoopAccessorService.java:409)
> > >         at java.security.AccessController.doPrivileged(Native Method)
> > >         at javax.security.auth.Subject.doAs(Subject.java:415)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> > >         at
> > >
> > >
> >
> org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:409)
> > >         ... 26 more
> > >
> > > I know this is usually related to not having the correct proxy configs
> in
> > > the core site but my current core site proxy configs are below (and I
> > have
> > > bounced both the NN and the JT since applying them):
> > >
> > > <property><name>dfs.permissions</name><value>false</value></property>
> > >
> > >
> >
> <property><name>oozie.service.ProxyUserService.proxyuser.hadoop.hosts</name><value>*</value></property>
> > >
> > >
> >
> <property><name>oozie.service.ProxyUserService.proxyuser.hadoop.groups</name><value>*</value></property>
> > >
> > > If I recall correctly this authorization is only checked at the the
> JT/NN
> > > level and therefore shouldn't need to be pushed to the core site on the
> > > slave machines right? Also, would there be any reason the wildcard
> would
> > be
> > > incompatible across hadoop distros (we are currently using 1.0.3 from
> > EMR)?
> > > Lastly, just for the sake of clarity is the proxy hosts config based on
> > the
> > > box submitting the oozie request (edge node) or based on the boxes
> > actually
> > > running the jobs (data/task nodes)?
> > >
> > > --
> > > Matt
> > >
> > >
> > > On Thu, Dec 13, 2012 at 6:19 PM, Alejandro Abdelnur <t...@cloudera.com
> > > >wrote:
> > >
> > > > Matt,
> > > >
> > > > It is not matter of bundling native code or not. Officially we
> suppose
> > to
> > > > do source releases only. As convenience we could do binaries, but
> there
> > > are
> > > > discussions about that, if the could be signed or not.
> > > >
> > > > Regarding installing/running oozie in EC2. I never done it. Would you
> > > mind
> > > > writing up a wiki on it once you figure it out?
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 4:02 PM, Matt Goeke <goeke.matt...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Thank you both for the follow-up.
> > > > >
> > > > > 2 other questions that pertain to this:
> > > > > 1) I don't remember any natives being required for Oozie so is
> there
> > a
> > > > > reason why we don't release with a -bin like most other apache
> > > projects?
> > > > > 2) Are there any issues I might expect to run into when trying to
> run
> > > > this
> > > > > on EC2 backed by EMR?
> > > > >
> > > > > --
> > > > > Matt
> > > > >
> > > > >
> > > > > On Thu, Dec 13, 2012 at 5:48 PM, Alejandro Abdelnur <
> > t...@cloudera.com
> > > > > >wrote:
> > > > >
> > > > > > Matt,
> > > > > >
> > > > > > Apache Oozie release artifacts are sources only. The easiest way
> to
> > > > build
> > > > > > the TARBALL is:
> > > > > >
> > > > > > * install Maven
> > > > > > * run bin/mkdistro.sh -DskipTests
> > > > > >
> > > > > > Then follow the Quick Start instructions.
> > > > > >
> > > > > > I'll open a JIRA to add this to the docs.
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 13, 2012 at 3:36 PM, Matt Goeke <
> > goeke.matt...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > All,
> > > > > > >
> > > > > > > I am falling back to Oozie 3.2 for now but can someone possibly
> > > > explain
> > > > > > how
> > > > > > > Oozie 3.3 is supposed to be configured? I was hoping to just
> > follow
> > > > the
> > > > > > > quick start guide but it seems like the packaging does not
> match
> > up
> > > > at
> > > > > > all.
> > > > > > >
> > > > > > > Trying to work through it I ended up downloading maven and
> > running
> > > a
> > > > > 'mvn
> > > > > > > install' on the folder which built some of the hadooplibs but I
> > am
> > > > > still
> > > > > > > missing all of the bin scripts.
> > > > > > >
> > > > > > > --
> > > > > > > Matt
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Alejandro
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Alejandro
> > > >
> > >
> >
>



-- 
Alejandro

Re: Installation and Configuration of Oozie 3.3

Reply via email to