date:20100317

RE: Storing Custom Java Objects in Hadoop Distibuted Cache

2010-03-17 Thread Sanjay Sharma

Hi Ninad, You can always use Java object serialization to store custom objects as files in Hadoop distributed cache before map/reducer start running. The thumb rule steps of such usage is- a. Create the object while configuring your job, serialize it to a file and put it is distributed cache b.

Distributed hadoop setup 0 live datanode problem in cluster

2010-03-17 Thread William Kang

Hi, I just moved from pseudo distributed hadoop to a four machine full distributed hadoop setup. But, after I start the dfs, there is no live node showing up. If I make master a slave too, then the datanode in master machine will show up. I looked up all logs and found no errors. The only thing

Re: Distributed hadoop setup 0 live datanode problem in cluster

2010-03-17 Thread Jeff Zhang

Can you post your namenode's log ? It seems that your data node can not connect to the name node. On Wed, Mar 17, 2010 at 2:43 PM, William Kang weliam.cl...@gmail.comwrote: Hi, I just moved from pseudo distributed hadoop to a four machine full distributed hadoop setup. But, after I start

Re: Distributed hadoop setup 0 live datanode problem in cluster

2010-03-17 Thread William Kang

Hi Jeff, Here is the log from my namenode: / 2010-03-17 03:09:59,750 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode

Re: Distributed hadoop setup 0 live datanode problem in cluster

2010-03-17 Thread William Kang

Hi Jeff, I think I partly found out the reasons of this problem. The /etc/hosts 127.0.0.1 has the master's host name in it. And the namenode took 127.0.0.1 as the ip address of the namenode. I fixed it and I already found two nodes. There is one still missing. I will let you guys know what

optimization help needed

2010-03-17 Thread Reik Schatz

Preparing a Hadoop presentation here. For demonstration I start up a 5 machine m1.large cluster in EC2 via cloudera scripts ($hadoop-ec2 launch-cluster my-hadoop-cluster 5). Then I sent a 500 MB xml file over into HDFS. The Mapper will receive a XML block as the key, select a email address

Fwd: Google Research: MapReduce: The programming model and practice

2010-03-17 Thread Edward J. Yoon

Just FWD. -- Forwarded message -- From: Edward J. Yoon edwardy...@apache.org Date: Wed, Mar 17, 2010 at 5:47 PM Subject: Google Research: MapReduce: The programming model and practice To: hama-...@incubator.apache.org FYI, http://research.google.com/pubs/pub36249.html -- Best

Sqoop Installation on Apache Hadop 0.20.2

2010-03-17 Thread Utku Can Topçu

Dear All, I'm trying to run tests using MySQL as some kind of a datasource, so I thought cloudera's sqoop would be a nice project to have in the production. However, I'm not using the cloudera's hadoop distribution right now, and actually I'm not thinking of switching from a main project to a

Re: Sqoop Installation on Apache Hadop 0.20.2

2010-03-17 Thread Reik Schatz

At least for MRUnit, I was not able to find it outside of the Cloudera distribution (CDH). What I did: installing CDH locally using apt (Ubuntu), searched for and copied the mrunit library into my local Maven repository, and removed CDH after. I guess the same is somehow possible for Sqoop.

Measuring running times

2010-03-17 Thread Antonio D'Ettole

Hi everybody, as part of my project work at school I'm running some Hadoop jobs on a cluster. I'd like to measure exactly how long each phase of the process takes: mapping, shuffling (ideally divided in copying and sorting) and reducing. The tasktracker logs do not seem to supply the start/end

Re: Storing Custom Java Objects in Hadoop Distibuted Cache

2010-03-17 Thread Ninad Raut

These are good inputs Sanjay. Thanks for the help. On Wed, Mar 17, 2010 at 11:33 AM, Sanjay Sharma sanjay.sha...@impetus.co.in wrote: Hi Ninad, You can always use Java object serialization to store custom objects as files in Hadoop distributed cache before map/reducer start running. The

Re: optimization help needed

2010-03-17 Thread Gang Luo

HI, you can control the number of reducers by JobConf.setNumReduceTasks(n). The number of mappers is defined by (file size) / (split size). By default the split size is 64MB. Since you dataset is not very large, there should be no big difference if you change these. if you are only interested

Re: optimization help needed

2010-03-17 Thread Reik Schatz

Very good input not to sent the original xml over to the reducers. For the JobConf.setNumReduceTasks(n) isn't that just a hint but the real number will be determined based on the Partitioner I use, which will be the default HashPartioner? One other thought I had, what will happen if the values

Re: optimization help needed

2010-03-17 Thread Gang Luo

Hi Reik, the number of reducer is not a hint (mappers # is a hint). The default hash partitioner will hash and sent records to each reducer in round-robin way based on the reducers #. If the values list is too large to fit into heap memory, then you will get an exception and job will fail

Re: Slave data node failing to connect?

2010-03-17 Thread Kane, David

Folks, Does anyone know if this earlier post ever reached a resolution? I am trying to work through the same tutorial, and I have encountered the same issue. Of the candidate problems Jason suggested, none of them seem to pan out in my case (details below). I'm looking for suggestions as to

Re: optimization help needed

2010-03-17 Thread Reik Schatz

Thanks Gang, I will do some testing tomorrow - skip sending whole XML, maybe adding some Reducers - and see where I end up. Gang Luo wrote: Hi Reik, the number of reducer is not a hint (mappers # is a hint). The default hash partitioner will hash and sent records to each reducer in

Re: Trashbin is not recycled

2010-03-17 Thread Marcus Herou

Thanks On Mon, Mar 15, 2010 at 10:25 AM, Rekha Joshi rekha...@yahoo-inc.comwrote: ..dfs -rmr -skipTrash /user/hadoop/.Trash recreates .Trash, on consecutive rmr...-skipTrash can be used generally if you don't want a backup of deletes, here only to illustrate.. On 3/15/10 2:43 PM, Marcus

Re: Measuring running times

2010-03-17 Thread Simone Leo

At the default log level, Hadoop job logs (the ones you also get in the job's output directory under _logs/history) contain entries like the following: ReduceAttempt TASK_TYPE=REDUCE TASKID=tip_200809020551_0008_r_02 TASK_ATTEMPT_ID=task_200809020551_0008_r_02_0 START_TIME=1220331166789

Re: Measuring running times

2010-03-17 Thread Owen O'Malley

On Mar 17, 2010, at 4:47 AM, Antonio D'Ettole wrote: Hi everybody, as part of my project work at school I'm running some Hadoop jobs on a cluster. I'd like to measure exactly how long each phase of the process takes: mapping, shuffling (ideally divided in copying and sorting) and reducing.

Is there an easy way to clear old jobs from the jobtracker webpage?

2010-03-17 Thread Raymond Jennings III

I'd like to be able to clear the contents of the jobs that have completed running on the jobtracker webpage. Is there an easy way to do this without restarting the cluster?

Austin Hadoop Users Group - Tomorrow Evening (Thursday)

2010-03-17 Thread Stephen Watt

Hi Folks The Austin HUG is meeting tomorrow night. I hope to see you there. We have speakers from Rackspace (Stu Hood on Cassandra) and IBM (Gino Bustelo on BigSheets). Detailed Information is available at http://austinhug.blogspot.com/ Kind regards Steve Watt

when to sent distributed cache file

2010-03-17 Thread Gang Luo

Hi all, I doubt when does hadoop distributes the cache files. The moment we call DistributedCache.addCacheFile() ? Will the time to distribute caches be counted as part of the mapreduce job time? Thanks, -Gang

Re: Austin Hadoop Users Group - Tomorrow Evening (Thursday)

2010-03-17 Thread Alexandre Jaquet

Hi, Please let me know if you wil publish any kind of document, presentation, video and else Thanks in advance Alexandre Jaquet 2010/3/17 Stephen Watt sw...@us.ibm.com Hi Folks The Austin HUG is meeting tomorrow night. I hope to see you there. We have speakers from Rackspace (Stu Hood on

Re: WritableName can't load class in hive

2010-03-17 Thread Arvind Prabhakar

[cross posting to hive-user] Oded - how did you create the table in Hive? Did you specify any row format SerDe for the table? If not, then that may be the cause of this problem since the default LazySimpleSerDe is unable to deserialize the custom Writable key value pairs that you have used in

Re: when to sent distributed cache file

2010-03-17 Thread Gang Luo

Thanks Ravi. Here are some observations. I run job1 to generate some data used by the following job2 without replication. The total size of the job 1 output is 25mb and is in 50 files. I use distributed cache to sent all the files to nodes running job2 tasks. When job2 starts, it stayed at map

RE: WritableName can't load class in hive

2010-03-17 Thread Oded Rotem

No, I didn't specify any SerDe. I'll read up on that and see if it works. Thanks. -Original Message- From: Arvind Prabhakar [mailto:arv...@cloudera.com] Sent: Wednesday, March 17, 2010 10:40 PM To: common-user@hadoop.apache.org; hive-u...@hadoop.apache.org Subject: Re: WritableName

Re: Measuring running times

2010-03-17 Thread Antonio D'Ettole

At the default log level, Hadoop job logs (the ones you also get in the job's output directory under _logs/history) Thanks Simone, that's exactly what I was looking for. Look at the job history logs. They break down the times for each task I understand you guys are talking about the same

Re: Is there a way to suppress the attempt logs?

2010-03-17 Thread Bill Graham

Not sure if what you're asking is possible or not, but you could experiment with these params to see if you could achieve a similar effect. property namemapred.userlog.limit.kb/name value0/value descriptionThe maximum size of user-logs of each task in KB. 0 disables the cap. /description

Re: Is there a way to suppress the attempt logs?

2010-03-17 Thread Arun C Murthy

Moving to mapreduce-user@ On Mar 15, 2010, at 5:54 PM, abhishek sharma wrote: Hi all, Hadoop creates a directory (and some files) for each map and reduce task attempts in logs/userlogs on each tasktracker. Is there a way to configure Hadoop not to create these attempt logs? Not really,

Re: Sqoop Installation on Apache Hadop 0.20.2

2010-03-17 Thread Aaron Kimball

Hi Utku, Apache Hadoop 0.20 cannot support Sqoop as-is. Sqoop makes use of the DataDrivenDBInputFormat (among other APIs) which are not shipped with Apache's 0.20 release. In order to get Sqoop working on 20, you'd need to apply a lengthy list of patches from the project source repository to your

Re: hadoop under cygwin issue

2010-03-17 Thread Brian Wolf

Alex Kozlov wrote: Hi Brian, Is your namenode running? Try 'hadoop fs -ls /'. Alex On Mar 12, 2010, at 5:20 PM, Brian Wolf brw...@gmail.com wrote: Hi Alex, I am back on this problem. Seems it works, but I have this issue with connecting to server. I can connect 'ssh localhost' ok.

RE: Storing Custom Java Objects in Hadoop Distibuted Cache

Distributed hadoop setup 0 live datanode problem in cluster

Re: Distributed hadoop setup 0 live datanode problem in cluster

Re: Distributed hadoop setup 0 live datanode problem in cluster

Re: Distributed hadoop setup 0 live datanode problem in cluster

optimization help needed

Fwd: Google Research: MapReduce: The programming model and practice

Sqoop Installation on Apache Hadop 0.20.2

Re: Sqoop Installation on Apache Hadop 0.20.2

Measuring running times

Re: Storing Custom Java Objects in Hadoop Distibuted Cache

Re: optimization help needed

Re: optimization help needed

Re: optimization help needed

Re: Slave data node failing to connect?

Re: optimization help needed

Re: Trashbin is not recycled

Re: Measuring running times

Re: Measuring running times

Is there an easy way to clear old jobs from the jobtracker webpage?

Austin Hadoop Users Group - Tomorrow Evening (Thursday)

when to sent distributed cache file

Re: Austin Hadoop Users Group - Tomorrow Evening (Thursday)

Re: WritableName can't load class in hive

Re: when to sent distributed cache file

RE: WritableName can't load class in hive

Re: Measuring running times

Re: Is there a way to suppress the attempt logs?

Re: Is there a way to suppress the attempt logs?

Re: Sqoop Installation on Apache Hadop 0.20.2

Re: hadoop under cygwin issue

31 matches

Site Navigation

Mail list logo

Footer information