2.4 / yarn pig jobs fail due to exit code 1 from container.

2014-05-23 Thread Kevin Burton
Trying to track down exactly what's happening. Right now I'm getting this (see below). The setup documentation for 2.4 could definitely be better. Probably with a sample/working config. Looks like too much of this is left up as an exercise to the user. 2014-05-23 21:20:30,652 INFO

debugging class path issues with containers.

2014-05-23 Thread Kevin Burton
What's the best way to debug yarn container issues? I was going to try to tweak the script but it gets deleted after the job fails. Looks like I'm having an issue with the classpath.. I'm getting a basic hadoop NCDFE on startup so I think it just has a broken class path. but of course I need to

The documentation for permissions of ./bin/container-executor should be more clear.

2014-05-23 Thread Kevin Burton
This just bit me… spent half a day figuring it out! :-( The only way I was able to debug it was with ./bin/container-executor --checksetup Once that stopped complaining my jobs were working ok. this shouldn't have taken that much time… initial setup documentation could be seriously improved.

RE: Permission problem

2013-04-30 Thread Kevin Burton
I have relaxed it even further so now it is 775 kevin@devUbuntu05:/var/log/hadoop-0.20-mapreduce$ hadoop fs -ls -d / Found 1 items drwxrwxr-x - hdfs supergroup 0 2013-04-29 15:43 / But I still get this error: 2013-04-30 07:43:02,520 FATAL

RE: Permission problem

2013-04-30 Thread Kevin Burton
the permission to 775 so that the group would also have write permission but that didn't seem to help. From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Tuesday, April 30, 2013 8:20 AM To: Kevin Burton Subject: Re: Permission problem user?ls shows hdfs and the log says mapred.. Warm

RE: Permission problem

2013-04-30 Thread Kevin Burton
for hadoop hdfs and mr. Ideas? From: Kevin Burton [mailto:rkevinbur...@charter.net] Sent: Tuesday, April 30, 2013 8:31 AM To: user@hadoop.apache.org Cc: 'Mohammad Tariq' Subject: RE: Permission problem That is what I perceive as the problem. The hdfs file system was created with the user 'hdfs

RE: Permission problem

2013-04-30 Thread Kevin Burton
AM, Kevin Burton rkevinbur...@charter.net wrote: To further complicate the issue the log file in (/var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-devUbuntu05.log) is owned by mapred:mapred and the name of the file seems to indicate some other lineage (hadoop,hadoop). I am out of my

RE: Permission problem

2013-04-30 Thread Kevin Burton
namehadoop.tmp.dir/name value/data/hadoop/tmp/hadoop-${user.name}/value descriptionHadoop temporary folder/description /property From: Arpit Gupta [mailto:ar...@hortonworks.com] Sent: Tuesday, April 30, 2013 9:48 AM To: Kevin Burton Cc: user@hadoop.apache.org Subject: Re: Permission

RE: Permission problem

2013-04-30 Thread Kevin Burton
or set mapred.system.dir to /tmp/mapred/system in your mapred-site.xml. -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Apr 30, 2013, at 7:55 AM, Kevin Burton rkevinbur...@charter.net wrote: In core-site.xml I have: property namefs.default.name/name value hdfs

RE: Permission problem

2013-04-30 Thread Kevin Burton
[mailto:ar...@hortonworks.com] Sent: Tuesday, April 30, 2013 10:48 AM To: Kevin Burton Cc: user@hadoop.apache.org Subject: Re: Permission problem It looks like hadoop.tmp.dir is being used both for local and hdfs directories. Can you create a jira for this? What i recommended is that you create

Can't initialize cluster

2013-04-30 Thread Kevin Burton
I have a simple MapReduce job that I am trying to get to run on my cluster. When I run it I get: 13/04/30 11:27:45 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid mapreduce.jobtracker.address configuration value for

RE: Can't initialize cluster

2013-04-30 Thread Kevin Burton
To be clear when this code is run with 'java -jar' it runs without exception. The exception occurs when I run with 'hadoop jar'. From: Kevin Burton [mailto:rkevinbur...@charter.net] Sent: Tuesday, April 30, 2013 11:36 AM To: user@hadoop.apache.org Subject: Can't initialize cluster I have

RE: Can't initialize cluster

2013-04-30 Thread Kevin Burton
HADOOP_MAPRED_HOME in your hadoop-env.sh file and re-run the job. See if it helps. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Apr 30, 2013 at 10:10 PM, Kevin Burton rkevinbur...@charter.net wrote: To be clear when this code is run with 'java -jar' it runs

Re: Warnings?

2013-04-29 Thread Kevin Burton
- Unable to load native-hadoop library for your platform... using builtin-java classes where applicable /blokquote On Mon, Apr 29, 2013 at 10:21 AM, Kevin Burton rkevinbur...@charter.net wrote: I looked at the link you provided and found the Ubuntu is one of the “supported platforms

Re: Incompartible cluserIDS

2013-04-29 Thread Kevin Burton
Thank you the HDFS system seems to be up. Now I am having a problem with getting the JobTracker and TaskTracker up. According to the logs on the JobTracker mapred doesn't have write permission to /. I am not clear on what the permissions should be. Anyway, thank you. On Apr 29, 2013, at 4:30

Re: Incompartible cluserIDS

2013-04-29 Thread Kevin Burton
It is '/'? On Apr 29, 2013, at 5:09 PM, Mohammad Tariq donta...@gmail.com wrote: make it 755. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Apr 30, 2013 at 3:30 AM, Kevin Burton rkevinbur...@charter.net wrote: Thank you the HDFS system seems to be up

Re: M/R job to a cluster?

2013-04-28 Thread Kevin Burton
and job tracker. Regards, Sudhakara.st On Sat, Apr 27, 2013 at 2:52 AM, Kevin Burton rkevinbur...@charter.net wrote: It is hdfs://devubuntu05:9000. Is this wrong? Devubuntu05 is the name of the host where the NameNode and JobTracker should be running. It is also the host where I am

RE: Warnings?

2013-04-28 Thread Kevin Burton
? Thanks again. Kevin From: Ted Xu [mailto:t...@gopivotal.com] Sent: Friday, April 26, 2013 10:49 PM To: user@hadoop.apache.org Subject: Re: Warnings? Hi Kevin, Please see my comments inline, On Sat, Apr 27, 2013 at 11:24 AM, Kevin Burton rkevinbur...@charter.net wrote

Re: M/R job to a cluster?

2013-04-26 Thread Kevin Burton
It is hdfs://devubuntu05:9000. Is this wrong? Devubuntu05 is the name of the host where the NameNode and JobTracker should be running. It is also the host where I am running the M/R client code. On Apr 26, 2013, at 4:06 PM, Rishi Yadav ri...@infoobjects.com wrote: check core-site.xml and see

RE: M/R Staticstics

2013-04-26 Thread Kevin Burton
Answers below. From: Omkar Joshi [mailto:ojo...@hortonworks.com] Sent: Friday, April 26, 2013 7:15 PM To: user@hadoop.apache.org Subject: Re: M/R Staticstics Have you enabled security? No can you share the output for your hdfs? bin/hadoop fs -ls / kevin@devUbuntu05:~$ hadoop

Re: Warnings?

2013-04-26 Thread Kevin Burton
Is the native library not available for Ubuntu? If so how do I load it? Can I tell which key is off? Since I am just starting I would want to be as up to date as possible. It is out of date probably because I copied my examples from books and tutorials. The main class does derive from Tool.

Comparison between JobClient/JobConf and Job/Configuration

2013-04-25 Thread Kevin Burton
I notice that in some beginning texts on starting a Hadoop MapReduce job sometimes JobClient/JobConf is used and sometimes Job/Configuration is used. I have yet to see anyone comment on the features/benefits of either set of methods. Could someone comment on their preferred method for starting a

Import with Sqoop

2013-04-23 Thread Kevin Burton
I execute the line: sqoop import --connect 'jdbc:sqlserver://nbreports:1433;databaseName=productcatalog' --username USER --password PASSWORD --table CatalogProducts And I get the following output: Warning: /usr/lib/hbase does not exist! HBase imports will fail. Please set $HBASE_HOME

Re: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Kevin Burton
Thanks for sharing. I'd love to play with it, do you have a README/user-guide for systat? Not a ton but I could write some up... Basically I modeled it after vmstat/iostat on Linux. http://sebastien.godard.pagesperso-orange.fr/documentation.html The theory is that most platforms have

A new map reduce framework for iterative/pipelined jobs.

2011-12-26 Thread Kevin Burton
One key point I wanted to mention for Hadoop developers (but then check out the announcement). I implemented a version of sysstat (iostat, vmstat, etc) in Peregrine and would be more than happy to move it out and put it in another dedicated project.

Re: Performance of direct vs indirect shuffling

2011-12-21 Thread Kevin Burton
, Kevin Burton burtona...@gmail.comwrote: We've discussed 'push' v/s 'pull' shuffle multiple times and each time turned away due to complexities in MR1. With MRv2 (YARN) this would be much more doable. Ah gotcha. This is what I expected as well. It would be interesting to see a list

Performance of direct vs indirect shuffling

2011-12-20 Thread Kevin Burton
The current hadoop implementation shuffles directly to disk and then those disk files are eventually requested by the target nodes which are responsible for doing the reduce() on the intermediate data. However, this requires more 2x IO than strictly necessary. If the data were instead shuffled

Re: Performance of direct vs indirect shuffling

2011-12-20 Thread Kevin Burton
On Tue, Dec 20, 2011 at 4:53 PM, Todd Lipcon t...@cloudera.com wrote: The advantages of the pull based shuffle is fault tolerance - if you shuffle to the reducer and then the reducer dies, you have to rerun *all* of the earlier maps in the push model. you would have the same situation if you

Re: Performance of direct vs indirect shuffling

2011-12-20 Thread Kevin Burton
We've discussed 'push' v/s 'pull' shuffle multiple times and each time turned away due to complexities in MR1. With MRv2 (YARN) this would be much more doable. Ah gotcha. This is what I expected as well. It would be interesting to see a list of changes like this in MR1 vs MR2 to see what

output from one map reduce job as the input to another map reduce job?

2011-09-27 Thread Kevin Burton
Is it possible to connect the output of one map reduce job so that it is the input to another map reduce job. Basically… then reduce() outputs a key, that will be passed to another map() function without having to store intermediate data to the filesystem. Kevin -- Founder/CEO Spinn3r.com

Re: Has anyone ever written a file system where the data is held in resources

2011-09-14 Thread Kevin Burton
14, 2011 at 9:38 AM, Kevin Burton bur...@spinn3r.com wrote: You can already do this with the JAR file format… if you load a resource via path it uses the class loader system to find it in all available jars. Kevin On Wed, Sep 14, 2011 at 9:24 AM, Steve Lewis lordjoe2...@gmail.comwrote: When

A modest proposal for simplifying zookeeper :)

2009-01-09 Thread Kevin Burton
OK so it sounds from the group that there are still reasons to provide rope in ZK to enable algorithms like leader election. Couldn't ZK ship higher level interfaces for leader election, mutexes, semapores, queues, barriers, etc instead of pushing this on developers? Then the remaining APIs,

Re: A modest proposal for simplifying zookeeper :)

2009-01-09 Thread Kevin Burton
:) . We havent had the bandwidth to provide such interfaces for zookeeper. It would be great to have all such recipes as a part of contrib package of zookeeper. mahadev On 1/9/09 11:44 AM, Kevin Burton bur...@spinn3r.com wrote: OK so it sounds from the group that there are still reasons

Re: Sending data during NodeDataChanged or NodeCreated

2009-01-07 Thread Kevin Burton
On Wed, Jan 7, 2009 at 9:25 AM, Benjamin Reed br...@yahoo-inc.com wrote: This is the behavior we had when we first implemented the API, and in every case where people used the information there was a bug. it is virtually impossible to use correctly. In general I'm all for giving people rope,

Re: ouch, zookeeper infinite loop

2009-01-07 Thread Kevin Burton
: The version of Jute we use is really an ancient version of recordio ser/deser library in hadoop. We do want to move to some better(versioned/fast/well accepted) ser/deser library. mahadev On 1/7/09 12:08 PM, Kevin Burton bur...@spinn3r.com wrote: Ah... you think it was because it was empty

event re-issue on reconnect?

2009-01-06 Thread Kevin Burton
I have an event watching a file... and if I restart the server I get this: onConnect onData path: /foo, version: 4, data: '2333' onDisconnect onConnect onData path: /foo, version: 4, data: '2333' It re-issues the same version of the file. I can of course watch for this in my code but it seems

Re: Reconnecting to another host on failure but before session expires...

2009-01-05 Thread Kevin Burton
are stored locally on a particlar an server, etc) On Jan 5, 2009, at 12:03 AM, Kevin Burton bur...@spinn3r.com wrote: I'm not observing this behavior... if I shutdown the zookeeper server my client doesn't reconnect and I get a disconnect event followed by eventual session expiration. Which

multiple disconnect events is not a state change.

2009-01-04 Thread Kevin Burton
Shutting down my zookeeper server yields this on my client. Continual disconnect events. Shouldn't only one be issued? The second one is not a state change. WatchedEvent: Server state change. New state: Disconnected WatchedEvent: Server state change. New state: Disconnected WatchedEvent:

Persistent watches........

2009-01-03 Thread Kevin Burton
Because watches are one time triggers and there is latency between getting the event and sending a new request to get a watch you cannot reliably see every change that happens to a node in ZooKeeper. Be prepared to handle the case where the znode changes multiple times between getting the