Re: How to do load control of MapReduce

2009-05-12 Thread zsongbo
Hi Stefan, Yes, the 'nice' cannot resolve this problem. Now, in my cluster, there are 8GB of RAM. My java heap configuration is: HDFS DataNode : 1GB HBase-RegionServer: 1.5GB MR-TaskTracker: 1GB MR-child: 512MB (max child task is 6, 4 map task + 2 reduce task) But the memory usage is still tig

how to connect to remote hadoop dfs by eclipse plugin?

2009-05-12 Thread andy2005cst
when i use eclipse plugin hadoop-0.18.3-eclipse-plugin.jar and try to connect to a remote hadoop dfs, i got ioexception. if run a map/reduce program it outputs: 09/05/12 16:53:52 INFO ipc.Client: Retrying connect to server: /**.**.**.**:9100. Already tried 0 time(s). 09/05/12 16:53:52 INFO ipc.Cli

Re: Winning a sixty second dash with a yellow elephant

2009-05-12 Thread Steve Loughran
Arun C Murthy wrote: ... oh, and getting it to run a marathon too! http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html Owen & Arun Lovely. I will now stick up the pic of you getting the first results in on your laptop at apachecon

Re: Huge DataNode Virtual Memory Usage

2009-05-12 Thread Steve Loughran
Stefan Will wrote: Raghu, I don't actually have exact numbers from jmap, although I do remember that jmap -histo reported something less than 256MB for this process (before I restarted it). I just looked at another DFS process that is currently running and has a VM size of 1.5GB (~600 resident)

Re: How to do load control of MapReduce

2009-05-12 Thread Steve Loughran
zsongbo wrote: Hi Stefan, Yes, the 'nice' cannot resolve this problem. Now, in my cluster, there are 8GB of RAM. My java heap configuration is: HDFS DataNode : 1GB HBase-RegionServer: 1.5GB MR-TaskTracker: 1GB MR-child: 512MB (max child task is 6, 4 map task + 2 reduce task) But the memory u

append() production support

2009-05-12 Thread Sasha Dolgy
Does anyone have any vague ideas when append() may be available for production usage? Thanks in advance -sasha -- Sasha Dolgy sasha.do...@gmail.com

Re: How to do load control of MapReduce

2009-05-12 Thread zsongbo
Yes, I also found that the TaskTracker should not use so much memory. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 32480 schubert 35 10 1411m 172m 9212 S0 2.2 8:54.78 java The previous 1GB is the default value, I have just change the heap of TT to 384MB one hours

Re: How to do load control of MapReduce

2009-05-12 Thread Stefan Will
Yes, I think the JVM uses way more memory than just its heap. Now some of it might be just reserved memory, but not actually used (not sure how to tell the difference). There are also things like thread stacks, jit compiler cache, direct nio byte buffers etc. that take up process space outside of t

Re: How to do load control of MapReduce

2009-05-12 Thread Steve Loughran
Stefan Will wrote: Yes, I think the JVM uses way more memory than just its heap. Now some of it might be just reserved memory, but not actually used (not sure how to tell the difference). There are also things like thread stacks, jit compiler cache, direct nio byte buffers etc. that take up proce

Re: large files vs many files

2009-05-12 Thread Sasha Dolgy
Right now data is received in parallel and is written to a queue, then a single thread reads the queue and writes those messages to a FSDataOutputStream which is kept open, but the messages never get flushed. Tried flush() and sync() with no joy. 1. outputStream.writeBytes(rawMessage.toString());

Re: large files vs many files

2009-05-12 Thread Sasha Dolgy
2009-05-12 12:42:17,470 DEBUG [Thread-7] (FSStreamManager.java:28) hdfs.HdfsQueueConsumer: Thread 19 getting an output stream 2009-05-12 12:42:17,470 DEBUG [Thread-7] (FSStreamManager.java:49) hdfs.HdfsQueueConsumer: Re-using existing stream 2009-05-12 12:42:17,472 DEBUG [Thread-7] (FSStreamManager

Re: HDFS to S3 copy problems

2009-05-12 Thread Tom White
Ian - Thanks for the detailed analysis. It was these issues that lead me to create a temporary file in NativeS3FileSystem in the first place. I think we can get NativeS3FileSystem to report progress though, see https://issues.apache.org/jira/browse/HADOOP-5814. Ken - I can't see why you would be g

Re: Hadoop Summit 2009 - Open for registration

2009-05-12 Thread Amandeep Khurana
It shows sold out on the website. Any chances of more seats opening up? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Tue, May 5, 2009 at 2:10 PM, Ajay Anand wrote: > This year's Hadoop Summit > (http://developer.yahoo.com/events/hadoopsu

RE: Hadoop Summit 2009 - Open for registration

2009-05-12 Thread Ajay Anand
You can register at http://hadoopsummit09.eventbrite.com/ Ajay -Original Message- From: Amandeep Khurana [mailto:ama...@gmail.com] Sent: Tuesday, May 12, 2009 9:55 AM To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org Subject: Re: Hadoop Summit 2009 - Open for registration I

Hadoop-on-Demand question: key/value pairs in child opts

2009-05-12 Thread Jiaqi Tan
Hi, I'd like to do this in my hodrc file: client-params = ...,,mapred.child.java.opts="-Dkey=value",... but HoD doesn't like it: error: 1 problem found. Check your command line options and/or your configuration file /hodrc Any ideas how to specify "nested equal"s? Has anyone ever tried this, or

Re: Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-12 Thread stack
On Mon, May 11, 2009 at 9:43 PM, Raghu Angadi wrote: > stack wrote: > >> Thanks Raghu: >> >> Here is where it gets stuck: [...] >> > > Is that where it normally stuck? That implies it is spending unusually long > time at the end of writing a block, which should not be the case. I studied datan

No reduce tasks running, yet 1 is pending

2009-05-12 Thread Saptarshi Guha
Hello, I mentioned this issue before for the case of map tasks. I have 43 reduce tasks, 42 completed, 1 pending and 0 running. This is the case for the last 30 minutes. A pictur(tiff) of the job tracker can be found here( http://www.stat.purdue.edu/~sguha/mr.tiff ), since I haven't canceled the jo

Re: No reduce tasks running, yet 1 is pending

2009-05-12 Thread Saptarshi Guha
Interestingly, when i started other jobs, this one finished. I have no idea why. Saptarshi Guha On Tue, May 12, 2009 at 10:36 PM, Saptarshi Guha wrote: > Hello, > I mentioned this issue before for the case of map tasks. I have 43 > reduce tasks, 42 completed, 1 pending and 0 running. > This is

hadoop streaming reducer values

2009-05-12 Thread Alan Drew
Hi, I have a question about the that the reducer gets in Hadoop Streaming. I wrote a simple mapper.sh, reducer.sh script files: mapper.sh : #!/bin/bash while read data do #tokenize the data and output the values echo $data | awk '{token=0; while(++token<=NF) print $token"\t1"}' done r

Re: Winning a sixty second dash with a yellow elephant

2009-05-12 Thread Ian jonhson
Interesting so, where can I download the benchmark and relative test codes? On Tue, May 12, 2009 at 8:38 AM, Arun C Murthy wrote: > ... oh, and getting it to run a marathon too! > > http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html > > Owen & Arun >

Re: how to connect to remote hadoop dfs by eclipse plugin?

2009-05-12 Thread Rasit OZDAS
Your hadoop isn't working at all or isn't working at the specified port. - try stop-all.sh command on namenode. if it says "no namenode to stop", then take a look at namenode logs and paste here if anything seems strange. - If namenode logs are ok (filled with INFO messages), then take a look at al

RE: public IP for datanode on EC2

2009-05-12 Thread Joydeep Sen Sarma
(raking up real old thread) After struggling with this issue for sometime now - it seems that accessing hdfs on ec2 from outside ec2 is not possible. This is primarily because of https://issues.apache.org/jira/browse/HADOOP-985. Even if datanode ports are authorized in ec2 and we set the public

Re: how to improve the Hadoop's capability of dealing with small files

2009-05-12 Thread Rasit OZDAS
I have the similar situation, I have very small files, I never tried HBase (want to), but you can also group them and write (let's say) 20-30 into a file as every file becomes a key in that big file. There are methods in API which you can write an object as a file into HDFS, and read again to get

How can I get the actual time for one write operation in HDFS?

2009-05-12 Thread Xie, Tao
DFSOutputStream.writeChunk() enqueues packets into data queue and after that it returns. So write is asynchronous. I want to know the total actual time of HDFS executing the write operation (start from writeChunk() to the time that each replication is written on disk). How can get that time? Th