Re:Re:capacity_scheduler: working

2009-12-16 Thread anjali nair
This is the mapred-site.xml. I am using hadoop-0.20.1. The new queue, cqueue is being displayed in jobtracker UI. But the job just does not get submitted to it. It still goes to default queue. mapred.job.tracker cluster1:9001 mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.Capacit

Re: Re:capacity_scheduler: working

2009-12-16 Thread anjali nair
Yes i did. But it still is the same. I tried removing the deafult queue entirely. But that gives an error. On Wed, Dec 16, 2009 at 1:47 PM, chaitanya krishna < chaitanyavv.ii...@gmail.com> wrote: > Hi, > > After updating conf/mapred-site.xml, did you restart using > bin/stop-mapred.sh and bin/st

Re: Re:capacity_scheduler: working

2009-12-16 Thread chaitanya krishna
Hi, for 20.1 version , the property name is mapred.job.queue.name while for 21, it is mapreduce.job.queue.name. Hope this helps. - Chaitanya. On Wed, Dec 16, 2009 at 1:54 PM, anjali nair wrote: > Yes i did. But it still is the same. I tried removing the deafult queue > entirely. But that gives

Re: Help with fuse-dfs

2009-12-16 Thread Weiming Lu
Thanks very much. We use Ubuntu 8.0.4, and the kernel is 2.6.24-16-generic. When we run jps, it shows: 12405 startup.jar 5950 startup.jar 19216 SecondaryNameNode 25381 Jps 19053 NameNode 19289 JobTracker When I pass "-o private" for fuse-dfs just as: fuse_dfs dfs://10.15.62.4:54310 /mnt/dfs -opriv

Re: Hadoop read/write performance tests problem

2009-12-16 Thread Dmitriy Lyfar
2009/12/15 William Kinney > I had a similar issue and found the profiling information to be helpful: > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Profiling > > It should tell you where it's spending most of the time. > > Thank you William, which problem did you have if you

Re: Help with fuse-dfs

2009-12-16 Thread Brian Bockelman
Hey Weiming, We've recently found a race condition in FUSE-DFS that can be triggered when you run it from a host where GNOME is also running. It causes a segfault like you report below. The patch attached works for 19.1. Another option is to shut down all the instances of GNOME on your compu

map reduce to achieve cartessian product

2009-12-16 Thread Eguzki Astiz Lezaun
Hi, First, I would like to apologise if this question has been asked before (I am quite sure it has been) and I would appreciate very much if someone replies with a link to the answer. My question is quite simple. I have to files or datasets having a list of integers. example: dataset A: (a

Re: Help with fuse-dfs

2009-12-16 Thread Weiming Lu
Hi, Brian, that is the point, you are right. thanks very much. We used the Ubuntu Desktop for easy programing. It appears that I have to upgrade the hadoop to the latest version. On Wed, Dec 16, 2009 at 8:39 PM, Brian Bockelman wrote: > Hey Weiming, > > We've recently found a race condition in

Re: Help with fuse-dfs

2009-12-16 Thread Brian Bockelman
Sigh, I suppose this means I need to come out from under the rock I've been hiding under and file a JIRA. As I mentioned, the work-around is to mount FUSE-DFS when GNOME is not running. Brian On Dec 16, 2009, at 8:00 AM, Weiming Lu wrote: > Hi, Brian, that is the point, you are right. thanks

addChild NullPointerException when starting namenode and reading edits file

2009-12-16 Thread Erik Bernhardsson
Hi, we just encountered some problems when restarting our namenode. I'd really appreciate if anyone has any clue of what is going on here. The error message is as follows: 09/12/16 14:25:03 INFO namenode.NameNode: STARTUP_MSG: / STARTUP_

Re: map reduce to achieve cartessian product

2009-12-16 Thread Todd Lipcon
Hi Eguzki, Is one of the tables vastly smaller than the other? If one is small enough to fit in RAM, you can do this like so: 1. Add the small file to the DistributedCache 2. In the configure() method of the mapper, read the entire file into an ArrayList or somesuch in RAM 3. Set the input path o

Re: error message when executing SecondaryNameNode

2009-12-16 Thread Todd Lipcon
Hi Fu-Ming, Looks similar to this bug: http://issues.apache.org/jira/browse/HDFS-686 Does this problem persist, or was it a one time occurrence? -Todd On Tue, Dec 15, 2009 at 5:42 PM, Fu-Ming Tsai wrote: > Hello, all, > > I tried to execute 2 SecondaryNamenode in my env. However, one worked

Re: addChild NullPointerException when starting namenode and reading edits file

2009-12-16 Thread Todd Lipcon
Hi Erik, A few things to try: - does this FS store sensitive data or would it be possible to bzip2 the files and upload them somewhere? - can you add logging to the replay of FSEditLog so as to be aware of what byte offset is causing the issue? - DO take a backup of all of the state, immediately,

Re: map reduce to achieve cartessian product

2009-12-16 Thread Eguzki Astiz Lezaun
Thanks Todd, That was my plan-B or workaround. Anyway, I am happy to see there is no straight way to do so I could miss. The "small" list is a list of userId (dim table), so I can assume it as "small" but that can be a limitation in the scalability of our system. I will test the upper limits

Re: map reduce to achieve cartessian product

2009-12-16 Thread Todd Lipcon
Hi Eguzki, I wouldn't say the size of the list fitting into RAM would be the scalability bottleneck. If you're doing a full cartesian join of your users against a larger table, the fact that you're doing the full cartesian join is going to be the bottleneck first :) -Todd On Wed, Dec 16, 2009 at

Re: map reduce to achieve cartessian product

2009-12-16 Thread Edward Capriolo
On Wed, Dec 16, 2009 at 12:29 PM, Todd Lipcon wrote: > Hi Eguzki, > > I wouldn't say the size of the list fitting into RAM would be the > scalability bottleneck. If you're doing a full cartesian join of your users > against a larger table, the fact that you're doing the full cartesian join > is go

Re: Re: Re: Re: Re: map output no t euqal to reduce input

2009-12-16 Thread Gang Luo
Thanks Amogh. I solve it now. The reason is exactly what you said, I didn't consume all the records in the reducer. I break the loop when meet certain record. In this case, the rest records which I ignore will not be counted. So, there is no problem at all! -Gang - 原始邮件 发件人: Amogh

Re: addChild NullPointerException when starting namenode and reading edits file

2009-12-16 Thread Erik Bernhardsson
Thanks a lot for your reply, Todd. I added some answers below. On Wed, Dec 16, 2009 at 6:06 PM, Todd Lipcon wrote: > Hi Erik, > > A few things to try: > > - does this FS store sensitive data or would it be possible to bzip2 the > files and upload them somewhere? > Unfortunately, I think it'd be

Re: addChild NullPointerException when starting namenode and reading edits file

2009-12-16 Thread Todd Lipcon
On Wed, Dec 16, 2009 at 1:03 PM, Erik Bernhardsson wrote: > Thanks a lot for your reply, Todd. I added some answers below. > > On Wed, Dec 16, 2009 at 6:06 PM, Todd Lipcon wrote: > > > Hi Erik, > > > > A few things to try: > > > > - does this FS store sensitive data or would it be possible to bzi

More access to nodes in a distributed cache

2009-12-16 Thread Ahmad Ali Iqbal
Hi All, I am interested to know that can we use hadoop for applications where they need more control over the data and it can specify which node will do which part of the processing or the storage. For instance, suppose that I have two data files (datasets, say 1 and 2) and setup a hadoop with two

Re: Hadoop read/write performance tests problem

2009-12-16 Thread William Kinney
Our machines suffered from bad memcpy performance, which became apparent after profiling (a lot of time in System.arraycopy() for our inputformat reader). On 12/16/09, Dmitriy Lyfar wrote: > 2009/12/15 William Kinney > >> I had a similar issue and found the profiling information to be helpful: >

Re: More access to nodes in a distributed cache

2009-12-16 Thread Mike Kendall
it sounds to me like you might want to split what you want to do up into two separate jobs entirely... i don't quite understand your use case since the point of hadoop is to spread your load as much (and haphazardly!) as possible. -mike On Wed, Dec 16, 2009 at 3:43 PM, Ahmad Ali Iqbal wrote: >

Can hadoop 0.20.1 programs runs on Amazon Elastic Mapreduce?

2009-12-16 Thread 松柳
Hi all, I'm wondering whether Amazon starts to support the newest stable version of Hadoop, or we can still just use 0.18.3? Song Liu

Re: Can hadoop 0.20.1 programs runs on Amazon Elastic Mapreduce?

2009-12-16 Thread Ed Kohlwey
Last time I checked EMR only runs 0.18.3. You can use EC2 though, which winds up being cheaper anyways. On Wed, Dec 16, 2009 at 8:51 PM, 松柳 wrote: > Hi all, I'm wondering whether Amazon starts to support the newest stable > version of Hadoop, or we can still just use 0.18.3? > > Song Liu >

Re: Can hadoop 0.20.1 programs runs on Amazon Elastic Mapreduce?

2009-12-16 Thread Mark Kerzner
You can build your own clusters on EC2, using Cloudera's distribution. It worked for me. Mark On Wed, Dec 16, 2009 at 8:17 PM, Ed Kohlwey wrote: > Last time I checked EMR only runs 0.18.3. You can use EC2 though, which > winds up being cheaper anyways. > > On Wed, Dec 16, 2009 at 8:51 PM, 松柳 w

Re: More access to nodes in a distributed cache

2009-12-16 Thread Ahmad Ali Iqbal
Hi Mike, My understanding is, in hadoop job scheduling is done implicitly as you said it spread load as much as possible. However, I want to control task assignments to nodes. Let me put in a context of ad-hoc networking application scenario where a mobile devices broadcast *Hello* packets periodi

Re:Re:capacity_scheduler:working

2009-12-16 Thread anjali nair
Tanx a lot finally..its working -- Anjali M

Re: error message when executing SecondaryNameNode

2009-12-16 Thread Fu-Ming Tsai
Thanks for your help, Todd, It happens whenever I restart SecondaryNamenode. But what I'm curious is why the other host can work fine. Did you encounter this problem? Best regards, Fu-Ming On Thu, Dec 17, 2009 at 12:59 AM, Todd Lipcon wrote: > Hi Fu-Ming, > > Looks similar to this bug: > > h

Re: Help with fuse-dfs

2009-12-16 Thread Weiming Lu
So, Can we use one machine which is not the namenode or datanode to mount FUSE-DFS which is start up at console instead of X Window ?. Yesterday, I installed the hadoop-0.18.2 on another machine, and builded the FUSE-DFS successfully. Should the conf files such as hadoop-site.xml, masters and sla