incremental loads into hadoop

2011-09-30 Thread Sam Seigal
Hi, I am relatively new to Hadoop and was wondering how to do incremental loads into HDFS. I have a continuous stream of data flowing into a service which is writing to an OLTP store. Due to the high volume of data, we cannot do aggregations on the OLTP store, since this starts affecting the writ

October SF Hadoop Meetup

2011-09-30 Thread Aaron Kimball
The October SF Hadoop users meetup will be held Wednesday, October 12, from 7pm to 9pm. This meetup will be hosted by Twitter at their office on Folsom St. *Please note that due to scheduling constraints, we will begin an hour later than usual this month.* As usual, we will use the discussion-base

Re: error for deploying hadoop on macbook pro

2011-09-30 Thread Harsh J
Since you're only just beginning, and have unknowingly issued multiple "namenode -format" commands, simply run the following and restart DN alone: $ rm -r /private/tmp/hadoop-hadoop-user/dfs/data (And please do not reformat namenode, lest you go out of namespace ID sync yet again -- You can inste

Re: error for deploying hadoop on macbook pro

2011-09-30 Thread Jignesh Patel
Now I am able to make task tracker and job tracker running but I still have following problem with datanode. ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /private/tmp/hadoop-hadoop-user/dfs/data: namenode namespaceID = 798142055; datan

Fwd: error for deploying hadoop on macbook pro

2011-09-30 Thread Jignesh Patel
I am trying to setup single node cluster using hadoop-0.20.204.0 and while setting I found my job tracker and task tracker are not starting. I am attaching the exception. I also don't know why my while formatting name node my IP address still doesn't show 127.0.0.1 as follows.1/09/30 15:50:36 INFO

RE: Learning curve after MapReduce and HDFS

2011-09-30 Thread GOEKE, MATTHEW (AG/1000)
Are you learning for the sake of experimenting or are there functional requirements driving you to dive into this space? *If you are learning for the sake of adding new tools to your portfolio: Look into high level overviews of each of the projects and review architecture solutions that use the

Re: linux containers with Hadoop

2011-09-30 Thread bikash sharma
Thanks Edward, so mostly the linux containers are used in Hadoop for ensuring isolation in terms of providing security across mapreduce jobs from different users (even mesos seem to leverage the same) not for resource fairness? On Fri, Sep 30, 2011 at 1:39 PM, Edward Capriolo wrote: > On Fri, Sep

hadoop monitoring

2011-09-30 Thread patrick sang
I am using nagios to monitor Hadoop cluster. Would like to hear input from you guys. Questions 1. Would that be any difference between monitoring TCP port 9000 and curl port 50070 and grep for "namenode" 2. Job tracker I will monitor tcp port 9001 any drawbacks ? 3. Secondarynamenode wh

Re: linux containers with Hadoop

2011-09-30 Thread Edward Capriolo
On Fri, Sep 30, 2011 at 9:03 AM, bikash sharma wrote: > Hi, > Does anyone knows if Linux containers (which are like kernel supported > virtualization technique for providing resource isolation across > process/appication) have ever been used with Hadoop to provide resource > isolation for map/redu

Learning curve after MapReduce and HDFS

2011-09-30 Thread Varad Meru
Hi all, I have been working with Hadoop core, Hadoop HDFS and Hadoop MapReduce for the past 8 months. Now I want to learn other projects under Apache Hadoop such as Pig, Hive, HBase ... Can you suggest me a learning path to learn about the Hadoop Eco-System in a structured manner? I am confu

Re: mapred example task failing with error 127

2011-09-30 Thread Vinod Gupta Tankala
Thanks Harsh. I did look at userlogs dir. Although it creates subdirs for each job/attempt, there are no files in those directories. just the acl xml file. I had also looked at task tracker log and all it has is this - 2011-09-30 15:50:05,344 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAc

linux containers with Hadoop

2011-09-30 Thread bikash sharma
Hi, Does anyone knows if Linux containers (which are like kernel supported virtualization technique for providing resource isolation across process/appication) have ever been used with Hadoop to provide resource isolation for map/reduce tasks? If yes, what could be the up/down sides of such approac

Re: getting the process id of mapreduce tasks

2011-09-30 Thread bikash sharma
Thanks Varad. On Wed, Sep 28, 2011 at 9:35 PM, Varad Meru wrote: > The process ids of each individual task can be seen using jps and jconsole > commands provided by java. > > jconsole command on command-line interface provides a GUI screen for > monitoring running tasks within java. > > The task

Re: getting the process id of mapreduce tasks

2011-09-30 Thread bikash sharma
Thanks so much Harsh! On Thu, Sep 29, 2011 at 12:42 AM, Harsh J wrote: > Hello Bikash, > > The tasks run on the tasktracker, so that is where you'll need to look > for the process ID -- not the JobTracker/client. > > Crudely speaking, > $ ssh tasktracker01 # or whichever. > $ jps | grep Child |

Re: FileSystem closed

2011-09-30 Thread Steve Loughran
On 29/09/2011 18:02, Joey Echeverria wrote: Do you close your FileSystem instances at all? IIRC, the FileSystem instance you use is a singleton and if you close it once, it's closed for everybody. My guess is you close it in your cleanup method and you have JVM reuse turned on. I've hit this i