Re: Reducer goes past 100% complete?

2009-03-09 Thread Devaraj Das
There is a jira for this https://issues.apache.org/jira/browse/HADOOP-5210. There was a jira to address this problem with intermediate compression on and that's fixed - https://issues.apache.org/jira/browse/HADOOP-3131. On 3/9/09 9:15 PM, "Doug Cook" wrote: Hi folks, I've recently upgraded

Re: Super-long reduce task timeouts in hadoop-0.19.0

2009-02-21 Thread Devaraj Das
Bryan, the message 2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200902061117_3388/ attempt_200902061117_3388_r_66_0/output/file.out in any of the configured local directories i

Re: Calling a mapreduce job from inside another

2009-01-18 Thread Devaraj Das
You can chain job submissions at the client. Also, you can run more than one job in parallel (if you have enough task slots). An example of chaining jobs is there in src/examples/org/apache/hadoop/examples/Grep.java where the jobs grep-search and grep-sort are chained.. On 1/18/09 9:58 AM, "Adity

Re: correct pattern for using setOutputValueGroupingComparator?

2009-01-06 Thread Devaraj Das
On 1/6/09 9:47 AM, "Meng Mao" wrote: > Unfortunately, my team is on 0.15 :(. We are looking to upgrade to 0.18 as > soon as we upgrade our hardware (long story). > From comparing the 0.15 and 0.19 mapreduce tutorials, and looking at the > 4545 patch, I don't see anything that seems majorly dif

Re: Having trouble accessing MapFiles in the DistributedCache

2008-12-25 Thread Devaraj Das
IIRC, enabling symlink creation for your files should solve the problem. Call DistributedCache.createSymLink(); before submitting your job. On 12/25/08 10:40 AM, "Sean Shanny" wrote: > To all, > > Version: hadoop-0.17.2.1-core.jar > > I created a MapFile on a local node. > > I put the fil

Re: How to coordinate nodes of different computing powers in a same cluster?

2008-12-24 Thread Devaraj Das
obTracker when they are done running the current tasks. > - Aaron > > On Wed, Dec 24, 2008 at 1:12 AM, Devaraj Das wrote: > >> You can enable speculative execution for your jobs. >> >> >> On 12/24/08 10:25 AM, "Jeremy Chow" wrote: >> >

Re: How to coordinate nodes of different computing powers in a same cluster?

2008-12-23 Thread Devaraj Das
You can enable speculative execution for your jobs. On 12/24/08 10:25 AM, "Jeremy Chow" wrote: > Hi list, > I've come up against a scenario like this, to finish a same task, one of my > hadoop cluster only needs 5 seconds, and another one needs more than 2 > minutes. > It's a common phenomenon

Re: Reset hadoop servers

2008-12-09 Thread Devaraj Das
I know that the tasktracker/jobtracker doesn't have any command for re-reading the configuration. There is built-in support for restart/shut-down but those are via external scripts that internally do a kill/start. On 12/9/08 9:08 AM, "Christian Kunz" <[EMAIL PROTECTED]> wrote: > Is there support

Re: How are records with equal key sorted in hadoop-0.18?

2008-12-08 Thread Devaraj Das
Hi Christian, there is no notable change to the merge algorithm except that it uses IFile instead of SequenceFile for the input and output. Is your application running with intermediate compression on? What's the value configured for fs.inmemory.size.mb? What is the typical map output size (if you

Re: Can mapper get access to filename being processed?

2008-12-07 Thread Devaraj Das
single input path in different ways depending on > filename. > >Thanks again. > >Andy > > -Original Message- > From: Devaraj Das [mailto:[EMAIL PROTECTED] > Sent: Sunday, December 07, 2008 12:11 PM > To: core-user@hadoop.apache.org > Subject: R

Re: Can mapper get access to filename being processed?

2008-12-07 Thread Devaraj Das
On 12/7/08 11:32 PM, "Andy Sautins" <[EMAIL PROTECTED]> wrote: > > >I'm having trouble finding a way to do what I want, so I'm wondering > if I'm just not looking at the right place or if I'm thinking about the > problem in the wrong way. Any insight would be appreciated. > > > >

Re: slow shuffle

2008-12-06 Thread Devaraj Das
On 12/6/08 11:11 PM, "Songting Chen" <[EMAIL PROTECTED]> wrote: > That's cool. > > Update on Issue 2: > > I accidentally changed number of reducer to 1 (from 3). The problem is gone! > That one reducer overlaps with Map well and copies 300 small map output pretty > fast. > > So when there a

Re: does hadoop support "submit a new different job in map function"?

2008-12-06 Thread Devaraj Das
gt; Cheers, > Tim > > > On Sat, Dec 6, 2008 at 5:17 PM, Devaraj Das <[EMAIL PROTECTED]> wrote: >> >> >> >> On 12/6/08 2:42 PM, "deng chao" <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> we have met a case need your he

Re: does hadoop support "submit a new different job in map function"?

2008-12-06 Thread Devaraj Das
On 12/6/08 2:42 PM, "deng chao" <[EMAIL PROTECTED]> wrote: > Hi, > we have met a case need your help > The case: In the Mapper class, named MapperA, we define a map() function, > and in this map() function, we want to submit another new job, named jobB. > does hadoop support this case? Althoug

Re: combiner stats

2008-11-18 Thread Devaraj Das
intermediate spills. At the end a single spill file is generated. Note >> that, during the merges, the same record may pass multiple times through the >> combiner. > > On Mon, Nov 17, 2008 at 23:04, Devaraj Das <[EMAIL PROTECTED]> wrote: >> >> >> &g

Re: combiner stats

2008-11-17 Thread Devaraj Das
On 11/18/08 3:59 AM, "Paco NATHAN" <[EMAIL PROTECTED]> wrote: > Could someone please help explain the job counters shown for Combine > records on the JobTracker JSP page? > > Here's an example from one of our MR jobs. There are Combine input > and output record counters shown for both Map pha

Re: TaskTrackers disengaging from JobTracker

2008-10-29 Thread Devaraj Das
xception during RPC processing. Could you please get a stack trace of the JobTracker threads (without your patch) when the TTs are unable to talk to it. Access the url http://:/stacks That will tell us what the handlers are up to. > - Aaron > > > Devaraj Das wrote: >> >&g

Re: TaskTrackers disengaging from JobTracker

2008-10-29 Thread Devaraj Das
On 10/30/08 3:13 AM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote: > The system load and memory consumption on the JT are both very close to > "idle" states -- it's not overworked, I don't think > > I may have an idea of the problem, though. Digging back up a ways into the > JT logs, I see this: >

Re: "Merge of the inmemory files threw an exception" and diffs between 0.17.2 and 0.18.1

2008-10-28 Thread Devaraj Das
Quick question (I haven't looked at your comparator code yet) - is this reproducible/consistent? On 10/28/08 11:52 PM, "Deepika Khera" <[EMAIL PROTECTED]> wrote: > I am getting a similar exception too with Hadoop 0.18.1(See stacktrace > below), though its an EOFException. Does anyone have any id

Re: setting a different input/output class for combiner function than map and reduce functions

2008-09-24 Thread Devaraj Das
If you are on 0.18, it is possible to say that a combiner be invoked once per partition per spill. Do job.setCombineOnlyOnce(true); Or set the value of "mapred.combine.once" to true in your conf. On 9/24/08 2:28 PM, "Palleti, Pallavi" <[EMAIL PROTECTED]> wrote: > Can it be possible to ensure th

Re: OutOfMemory Error

2008-09-17 Thread Devaraj Das
On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: > > Hi all, > >I am getting outofmemory error as shown below when I ran map-red on huge > amount of data.: > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBu

Re: task assignment managemens.

2008-09-07 Thread Devaraj Das
No that is not possible today. However, you might want to look at the TaskScheduler to see if you can implement a scheduler to provide this kind of task scheduling. In the current hadoop, one point regarding computationally intensive task is that if the machine is not able to keep up with the rest

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

2008-09-06 Thread Devaraj Das
entries in datanode.log happens a few minutes apart repeatedly. > I've reduced # map-tasks so load on this node is below 1.0 with 5GB of > free memory (so it's not resource starvation). > > Espen > > On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <[EMAIL

Re: Sharing Memory across Map task [multiple cores] runing in same machine

2008-09-05 Thread Devaraj Das
Hadoop doesn't support this natively. So if you need this kind of a functionality, you'd need to code your application in such a way. But I am worried about the race conditions in determining which task should first create the ramfs and load the data. If you can provide atomicity in determining wh

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

2008-09-04 Thread Devaraj Das
> I started a profile of the reduce-task. I've attached the profiling output. > It seems from the samples that ramManager.waitForDataToMerge() doesn't > actually wait. > Has anybody seen this behavior. This has been fixed in HADOOP-3940 On 9/4/08 6:36 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]

Re: har/unhar utility

2008-09-03 Thread Devaraj Das
I should have mentioned that in the step where you create a har archive locally, you should use the local job runner. On 9/3/08 5:41 PM, "Devaraj Das" <[EMAIL PROTECTED]> wrote: > Ok .. You could try this - run the hadoop archive tool in your local hadoop > setup. For e.g.

Re: har/unhar utility

2008-09-03 Thread Devaraj Das
by > creating my own local har archive and uploading it. (small files lower > transfer speed from 40-70MB/s to hundreds ok kbps :( > > -Original Message- > From: Devaraj Das [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 03, 2008 4:00 AM > To: core-user@hadoo

Re: har/unhar utility

2008-09-03 Thread Devaraj Das
cal system and then > send them to HDFS, and back since I work with many small files (10kb) and > hadoop seem to behave poorly with them. > > Perhaps HBASE is another option. Is anyone using it in "production" mode? > And do I really need to downgrade to 17.x to install it? >

Re: har/unhar utility

2008-09-03 Thread Devaraj Das
Are you looking for user documentation on har? If so, here it is: http://hadoop.apache.org/core/docs/r0.18.0/hadoop_archives.html On 9/3/08 3:21 PM, "Dmitry Pushkarev" <[EMAIL PROTECTED]> wrote: > Does anyone have har/unhar utility? > > Or at least format description: It looks pretty obvious t

Re: times that the combiner will run

2008-08-25 Thread Devaraj Das
O+ times On 8/26/08 11:36 AM, "Zheng Shao" <[EMAIL PROTECTED]> wrote: > Does the framework promise the combiner will run 0+ times, or 1+ times? > > > > Zheng > > >

Re: hadoop 0.17.1 reducer not fetching map output problem

2008-07-24 Thread Devaraj Das
On 7/25/08 12:09 AM, "Andreas Kostyrka" <[EMAIL PROTECTED]> wrote: > On Thursday 24 July 2008 15:19:22 Devaraj Das wrote: >> Could you try to kill the tasktracker hosting the task the next time when >> it happens? I just want to isolate the problem - whether

Re: hadoop 0.17.1 reducer not fetching map output problem

2008-07-24 Thread Devaraj Das
Could you try to kill the tasktracker hosting the task the next time when it happens? I just want to isolate the problem - whether it is a problem in the TT-JT communication or in the Task-TT communication. From your description it looks like the problem is between the JT-TT communication. But pls

RE: Codec Returning null

2008-07-16 Thread Devaraj Das
What does your file name extension look like? > -Original Message- > From: Kylie McCormick [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 16, 2008 11:18 AM > To: core-user@hadoop.apache.org > Subject: Codec Returning null > > Hello Again! > > I'm running into a NullPointerException

RE: topology.script.file.name

2008-07-03 Thread Devaraj Das
This is strange. If you don't mind, pls send the script to me. > -Original Message- > From: Yunhong Gu1 [mailto:[EMAIL PROTECTED] > Sent: Thursday, July 03, 2008 9:49 AM > To: core-user@hadoop.apache.org > Subject: topology.script.file.name > > > > Hello, > > I have been trying to fig

RE: Release Date of Hadoop 0.17.1

2008-06-19 Thread Devaraj Das
It should be out within a couple of days. As of now voting is on and will end on 23rd. > -Original Message- > From: Joman Chu [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 19, 2008 4:48 PM > To: core-user@hadoop.apache.org > Subject: Release Date of Hadoop 0.17.1 > > Hello, I was won

RE: Question on HadoopStreaming and Memory Usage

2008-06-15 Thread Devaraj Das
Hadoop does provide a ulimit based way to control the memory consumption by the tasks it spawns via the config mapred.child.ulimit. Look at http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#Task+Executi on+%26+Environment However, what is lacking is a way to get the cumulative memory

RE: problem with streaming map jobs not getting killed

2008-06-09 Thread Devaraj Das
No the PID is not logged. So is it the framework side java tasks not getting killed or is it the Streaming children? By the way, the handling of process groups should be handled better when we have HADOOP-1380. > -Original Message- > From: Andreas Kostyrka [mailto:[EMAIL PROTECTED] > Sen

RE: Hadoop topology.script.file.name Form

2008-06-09 Thread Devaraj Das
tp://issues.apache.org/jira/browse/HADOOP-692>. > > Hope this will be helpful. > > > > YC > > On Sun, Jun 8, 2008 at 9:53 PM, Devaraj Das > <[EMAIL PROTECTED]> wrote: > > > Hi Iver, > > The implementation of the script depends on your setup. T

RE: Hadoop topology.script.file.name Form

2008-06-08 Thread Devaraj Das
Hi Iver, The implementation of the script depends on your setup. The main thing is that it should be able to accept a bunch of IP addresses and DNS names and be able to give back the rackIDs for each. It is a one-to-one correspondence between what you pass and what you get back. For getting the rac

RE: Stackoverflow

2008-06-04 Thread Devaraj Das
Hi Andreas, Here is what I did: bin/hadoop jar build/hadoop-0.18.0-dev-examples.jar randomtextwriter -Dtest.randomtextwrite.min_words_key=40 -Dtest.randomtextwrite.max_words_key=50 -Dtest.randomtextwrite.maps_per_host=1 textinput (this would generate 1GB of text data with pretty long sentences. R

RE: Stack Overflow When Running Job

2008-06-02 Thread Devaraj Das
Hi, do you have a testcase that we can run to reproduce this? Thanks! > -Original Message- > From: jkupferman [mailto:[EMAIL PROTECTED] > Sent: Monday, June 02, 2008 9:22 AM > To: core-user@hadoop.apache.org > Subject: Stack Overflow When Running Job > > > Hi everyone, > I have a job ru

RE: Questions on how to use DistributedCache

2008-05-22 Thread Devaraj Das
> -Original Message- > From: Taeho Kang [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 22, 2008 3:41 PM > To: core-user@hadoop.apache.org > Subject: Re: Questions on how to use DistributedCache > > Thanks for your reply. > > Just one more thing to ask.. > > From what I see from the

RE: OOM error with large # of map tasks

2008-05-01 Thread Devaraj Das
Long term we need to see how we can minimize the memory consumption by objects corresponding to completed tasks in the tasktracker. > -Original Message- > From: Devaraj Das [mailto:[EMAIL PROTECTED] > Sent: Friday, May 02, 2008 1:29 AM > To: 'core-user@hadoop.apache.o

RE: OOM error with large # of map tasks

2008-05-01 Thread Devaraj Das
IT_PENDING" state (the > runState variable in > MapTaskStatus). Then we took a look at another node on UI > just now, for a > given task tracker, under "Non-runnign tasks", there are at > least 200 or 300 COMMIT_PENDING tasks. It appears they stuck too. > &g

RE: OOM error with large # of map tasks

2008-04-30 Thread Devaraj Das
Hi Lili, the jobconf memory consumption seems quite high. Could you please let us know if you pass anything in the jobconf of jobs that you run? I think you are seeing the 572 objects since a job is running and the TaskInProgress objects for tasks of the running job are kept in memory (but I need t

RE: Getting jobTracker startTime from the JobClient

2008-04-30 Thread Devaraj Das
No, currently, there is no way to get that from the JobClient. Yes, please submit a patch. > -Original Message- > From: Pete Wyckoff [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 30, 2008 8:21 AM > To: core-user@hadoop.apache.org > Subject: Getting jobTracker startTime from the JobCl

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
diate key/value pairs written to file system > > > Yes, but Kayla is likely misguided in this respect. > > (my apologies for sounding doctrinaire) > > > On 4/18/08 11:08 AM, "Devaraj Das" <[EMAIL PROTECTED]> wrote: > > > Ted, note that Kayla want

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
; > > Isn't this just what Hadoop does when you set numReduces = 0? > > > On 4/18/08 10:45 AM, "Devaraj Das" <[EMAIL PROTECTED]> wrote: > > > Within a task you can get the taskId (which are unique). Define > > "public void co

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
ame thing, the file gets overwritten > b/c this other mapper is creating the exact file name and > doing the exact same thing as the other mapper is doing. > > Devaraj Das <[EMAIL PROTECTED]> wrote: Will your requirement > be addressed if, from within the map method, you create

RE: Reusing jobs

2008-04-18 Thread Devaraj Das
Jason, didn't get that. The jvm should exit naturally even without calling System.exit. Where exactly did you insert the System.exit? Please clarify. Thanks! > -Original Message- > From: Jason Venner [mailto:[EMAIL PROTECTED] > Sent: Friday, April 18, 2008 6:48 PM > To: core-user@hadoop

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
Will your requirement be addressed if, from within the map method, you create a sequence file using SequenceFile.createWriter api, write a key/value using the writer's append(key,value) API and then close the file ? You can do this for every key/value. Pls have a look at createWriter APIs and the

RE: Counters giving double values

2008-04-17 Thread Devaraj Das
it? kind regards, ud "Devaraj Das" <[EMAIL PROTECTED]> 04/16/2008 01:18 PM Please respond to core-user@hadoop.apache.org To cc Subject RE: Counters giving double values Pls file a jira for the counter updates part. It will be excellent if you c

RE: Counters giving double values

2008-04-16 Thread Devaraj Das
. > > i didnt try to run it in a distributed environment yet. only local. > > > > > > > "Devaraj Das" <[EMAIL PROTECTED]> > 04/16/2008 12:56 PM > Please respond to > core-user@hadoop.apache.org > > > To > > cc > > Subjec

RE: Counters giving double values

2008-04-16 Thread Devaraj Das
Also, in those cases where you see wrong counter values, did you validate the final (reduce) output for correctness (I am just trying to see whether the problem is with the Counter updates). > -Original Message- > From: Devaraj Das [mailto:[EMAIL PROTECTED] > Sent: Wednesday,

RE: Counters giving double values

2008-04-16 Thread Devaraj Das
Thanks for the detailed answer. Which hadoop version are you on? If you are confident that it is not a problem with your app, pls raise a jira. _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 16, 2008 3:25 PM To: core-user@hadoop.apache.org Subject: RE: Counte

RE: Counters giving double values

2008-04-16 Thread Devaraj Das
Input group refers to the number of that the reducer gets. So if your map output looked like To the reducer, the number of input groups would be 1 - and the number of records would be 2. The fact that the number of reduce input records doubled beats me ... Could you please let us know the map

RE: _temporary doesn't exist

2008-04-15 Thread Devaraj Das
Hi Grant, could you please copy-paste the exact command you used to run the program. Also the associated config files, etc. will help > -Original Message- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 15, 2008 6:03 PM > To: core-user@hadoop.apache.org > Subj

RE: Mapper OutOfMemoryError Revisited !!

2008-04-11 Thread Devaraj Das
Which hadoop version are you on? > -Original Message- > From: bhupesh bansal [mailto:[EMAIL PROTECTED] > Sent: Friday, April 11, 2008 11:21 PM > To: [EMAIL PROTECTED] > Subject: Mapper OutOfMemoryError Revisited !! > > > Hi Guys, I need to restart discussion around > http://www.nabble

RE: hadoop 0.15.3 r612257 freezes on reduce task

2008-03-28 Thread Devaraj Das
Hi Bradford, Could you please check what your mapred.local.dir is set to? Devaraj. > -Original Message- > From: Bradford Stephens [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 29, 2008 1:54 AM > To: core-user@hadoop.apache.org > Cc: [EMAIL PROTECTED] > Subject: Re: hadoop 0.15.3 r612

RE: [Map/Reduce][HDFS]

2008-03-28 Thread Devaraj Das
Hi Jean, no that is not directly possible. You have to pass your data through the DFS client in order for that to be part of the dfs (e.g. hadoop fs -put .., etc. or programatically). (removing core-dev from this thread since this is really a core-user question) > -Original Message- > Fro

RE: [memory leak?] Re: MapReduce failure

2008-03-16 Thread Devaraj Das
is something releated to my code, beside the wordcount > example many other users report the same problem: > See: > http://markmail.org/search/?q=org.apache.hadoop.mapred.MapTask > %24MapOutputBuffer.collect+order%3Adate-backward > Thanks for your help! > > Stefan > > &

RE: [memory leak?] Re: MapReduce failure

2008-03-15 Thread Devaraj Das
It might have something to do with your application itself. By any chance are you doing a lot of huge object allocation (directly or indirectly) within the map method? Which version of hadoop are you on? > -Original Message- > From: Stefan Groschupf [mailto:[EMAIL PROTECTED] > Sent: Sund

RE: speculative task execution and writing side-effect files

2008-01-23 Thread Devaraj Das
nerally hurts) and when people don't care about job latency > (nighttime - batch jobs) - the cluster is relatively idle > (and we could afford speculative execution - but it would > serve no purpose). > > perhaps i am totally off - would like to learn about other > people'

RE: Does the local mode of hadoop support pipes?

2008-01-23 Thread Devaraj Das
Pipes won't work in local mode. It assumes support from HDFS. You should be able to run it in a single node pseudo-distributed setup. Devaraj > -Original Message- > From: Cox Wood [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 23, 2008 1:41 PM > To: [EMAIL PROTECTED] > Subject: Does

RE: speculative task execution and writing side-effect files

2008-01-22 Thread Devaraj Das
> 1. In what situation would speculative task execution kick > in if it's enabled It would be based on tasks' progress. A speculative instance of a running task is launched if the task is question is lagging behind the others in terms of progress it has made. It also depends on whether there are