Re: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread Joey Echeverria
What does the jobtracker web page say is the total reduce capacity? -Joey On Mar 10, 2012, at 5:39, WangRamon wrote: > Hi All > > I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I > have 14 map and 14 reduce slots, here is the configuration: > > > >

Re: Split locations

2012-03-07 Thread Joey Echeverria
Are you asking who assigns the task to HostB or who makes sure a task assigned to HostB reads from HostB's local copy? The first is the job tracker. The second is the DFSClient used by the task. -Joey On Mar 7, 2012, at 7:57, Pedro Costa wrote: > Hi, > > In MapReduce, if the locations of th

Re: Query regarding Hadoop version 0.20.203

2012-03-05 Thread Joey Echeverria
You don't need to call readFields(), the FileStatus objects are already initialized. You should just be able to call the various getters to get the fields that you're interested in. -Joey On Mon, Mar 5, 2012 at 9:03 AM, Piyush Kansal wrote: > Harsh, > > When I trying to readFields as follows: >

Re: The location of the map execution

2012-03-04 Thread Joey Echeverria
Most people use either the fair or capacity schedulers. If you read those links I sent earlier, you can decide which better fits your use cases. -Joey Sent from my iPhone On Mar 4, 2012, at 14:44, Mohit Anchlia wrote: > > > On Sun, Mar 4, 2012 at 4:15 AM, Joey Echeverria w

Re: The location of the map execution

2012-03-04 Thread Joey Echeverria
than the number of TaskTrackers, you're much more likely to get node-local assignments. -Joey On Sat, Mar 3, 2012 at 10:44 PM, Mohit Anchlia wrote: > On Sat, Mar 3, 2012 at 7:41 PM, Joey Echeverria wrote: >> >> Sorry, I meant have you set the mapred.jobtracker.taskScheduler

Re: The location of the map execution

2012-03-03 Thread Joey Echeverria
Sorry, I meant have you set the mapred.jobtracker.taskScheduler property in your mapred-site.xml file. If not, you're using the standard, FIFO scheduler. The default scheduler doesn't do data-local scheduling, but the fair scheduler and capacity scheduler do. You want to set mapred.jobtracker.taskS

Re: The location of the map execution

2012-03-03 Thread Joey Echeverria
Which scheduler are you using? -Joey On Mar 3, 2012, at 18:52, Hassen Riahi wrote: > Hi all, > > We tried using mapreduce to execute a simple map code which read a txt file > stored in HDFS and write then the output. > The file to read is a very small one. It was not split and written entirel

Re: no log function for map/red in a cluster setup

2012-02-29 Thread Joey Echeverria
Try adding the log4j.properties file to he distributed cache, e.g.: hadoop jar job.jar -config conf -files conf/log4j.properties my.package.Class arg1 -Joey On Feb 29, 2012, at 16:15, GUOJUN Zhu wrote: > > What I found out is that the default conf/log4j.properties set root with INFO > and

Re: Unit Testing for Map Reduce

2012-02-27 Thread Joey Echeverria
Have you checked out this example: https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count On Mon, Feb 27, 2012 at 2:54 PM, Akhtar Muhammad Din wrote: > Hi, > I have been looking for a way to do unit testing of map reduce programs too. > There is not much of help or documentation a

Re: Query regarding Hadoop Partitioning

2012-02-24 Thread Joey Echeverria
It looks like your partitioner is an inner class. Try making it static: public static class MOPartition extends Partitioner public MOPartition() {} On Fri, Feb 24, 2012 at 3:48 PM, Piyush Kansal wrote: > Hi, > > I am right now stuck with an issue while extending the Partitioner class: >

Re: Mapreduce Job' user

2012-02-16 Thread Joey Echeverria
Are you using one of the security enabled releases of Hadoop (0.20.20x,1.0.x,0.23.x,CDH3)? Assuming you are, you need to do something like the following to impersonate a user: You'll need to modify your code to use something like this: UserGroupInformation.createRemoteUser("cuser").doAs(new Privi

Re: num of reducer

2012-02-16 Thread Joey Echeverria
per on job-tracker. > > Thanks, > Thamizh > > > On Thu, Feb 16, 2012 at 6:56 PM, Joey Echeverria wrote: > >> Hi Tamil, >> >> I'd recommend upgrading to a newer release as 0.19.2 is very old. As for >> your question, most input formats should set the num

Re: num of reducer

2012-02-16 Thread Joey Echeverria
Hi Tamil, I'd recommend upgrading to a newer release as 0.19.2 is very old. As for your question, most input formats should set the number mappers correctly. What input format are you using? Where did you see the number of tasks it assigned to the job? -Joey On Thu, Feb 16, 2012 at 1:40 AM, Tham

Re: reducers outputs

2012-01-29 Thread Joey Echeverria
Reduce output is normally stored in HDFS, just like your other files. Are you seeing different behavior? -Joey On Sun, Jan 29, 2012 at 1:05 AM, aliyeh saeedi wrote: > Hi > I want to save reducers outputs like other files in Hadoop. Does NameNode > keep any information about them? How can I do th

Re: hadoop ecosystem

2012-01-28 Thread Joey Echeverria
I'd add crunch (https://github.com/cloudera/crunch) and remove Hoop as it's integrated with Hadoop in 0.23.1+. -Joey On Sat, Jan 28, 2012 at 10:59 AM, Ayad Al-Qershi wrote: > I'm compiling a list of all Hadoop ecosystem/sub projects ordered > alphabetically and I need your help if I missed somet

Re: 0.22 Release and Security

2011-12-29 Thread Joey Echeverria
ting the use at the cluster gateway the only way? Once the user is > in the cluster, if I am not wrong the user can pretend as any user. > > Praveen > > On Thu, Dec 29, 2011 at 8:49 PM, Joey Echeverria wrote: >> >> Yes, it means that 0.22 doesn't support Kerbero

Re: 0.22 Release and Security

2011-12-29 Thread Joey Echeverria
Yes, it means that 0.22 doesn't support Kerberos. -Joey On Thu, Dec 29, 2011 at 9:41 AM, Praveen Sripati wrote: > Hi, > > The release notes for 0.22 > (http://hadoop.apache.org/common/releases.html#10+December%2C+2011%3A+release+0.22.0+available) > it says > >>The following features are not supp

Re: Are the input values (associated with a specific key) of reduce() method sorted?

2011-12-05 Thread Joey Echeverria
Hi James, By default, there is no guarantees on value order. Using some of the more advanced API features, you can perform a secondary sort of values. You can read a good example of it here: http://sonerbalkir.blogspot.com/2010/01/simulating-secondary-sort-on-values.html -Joey On Mon, Dec 5, 20

Re: Map-only output compression

2011-11-07 Thread Joey Echeverria
You want option 3. Option 1 is only used to compress intermediate output, it doesn't apply to map only jobs. Option 2 only enables compression for SequenceFileOutputFormat. If you're not using that output format, it won't help. -Joey On Monday, November 7, 2011, Claudio Martella wrote: > Hello

Re: Sharing data in a mapper for all values

2011-10-31 Thread Joey Echeverria
Yes, you can read the file in the configure() (old api) and setup() (new api) methods. The data can be saved in a variable that will be accessible to every call to map(). -Joey On Mon, Oct 31, 2011 at 7:45 PM, Arko Provo Mukherjee wrote: > Hello, > I have a situation where I am reading a big fil

Re: High Throughput using row keys based on the current time

2011-10-28 Thread Joey Echeverria
Have you looked into bulk imports? You can write your data into HDFS and then run a MapReduce job to generate the files that HBase uses to serve data. After the job finishes, there's a utility to copy the files into HBase's directory and your data is visible. Check out http://hbase.apache.org/bulk-

Re: Dropping 0.20.203 capacity scheduler into 0.20.2

2011-10-26 Thread Joey Echeverria
You can also check out Apache Whirr (http://whirr.apache.org/) if you decide to roll your own Hadoop clusters on EC2. It's crazy easy to get a cluster up and running with it. -Joey On Wed, Oct 26, 2011 at 3:04 PM, Kai Ju Liu wrote: > Hi Arun. Thanks for the prompt reply! It's a bit of a bummer t

Re: Questions about JVM Reuse

2011-10-25 Thread Joey Echeverria
> Is the configured amount of tasks for reuse a suggestion or will it actually > use it?  For example, if I’ve configured it to use a JVM for 4 tasks, will a > TaskTracker that has 8 tasks to process use 2 JVMs?  Or does it decide if it > actually wants to reuse one up to the maximum configured num

Re: Submitting a Hadoop task from withing a reducer

2011-10-04 Thread Joey Echeverria
ustom application master for his job type right? > > Matt > > -Original Message- > From: Joey Echeverria [mailto:j...@cloudera.com] > Sent: Tuesday, October 04, 2011 11:06 AM > To: mapreduce-user@hadoop.apache.org > Subject: Re: Submitting a Hadoop task from withing a r

Re: Submitting a Hadoop task from withing a reducer

2011-10-04 Thread Joey Echeverria
You may want to check out Yarn, coming in Hadoop 0.23: https://issues.apache.org/jira/browse/MAPREDUCE-279 -Joey On Tue, Oct 4, 2011 at 11:45 AM, Yaron Gonen wrote: > Hi, > Hadoop tasks are always stacked to form a linear user-managed workflow (a > reduce step cannot start before all previous m

Re: Tasks running out of memory and mapred.child.ulimit

2011-09-30 Thread Joey Echeverria
The ulimit should be set to 1.5 times the heap. One thing to note is the unit is on KB. -Joey On Sep 30, 2011 1:24 PM, "Steve Lewis" wrote: > I have a small hadoop task which is running out of memory on a colleague's > cluster. > I looked at has mapred-site.xml and find > > > mapred.child.java.o

Re: Lost task tracker reschedules all tasktracker's successful map tasks

2011-09-29 Thread Joey Echeverria
> The question is: the intermediary (before any reducer) results of completed > individual tasks are recorded in the HDFS, right? So why are these results > discarded, since the lost of the tasktracker is not the lost of already > processed data? Intermediate results are stored on the local disks

Re: System.out.println in Map / Reduce

2011-09-26 Thread Joey Echeverria
Print out go to the task logs. You can see those in the log directory on the tasktracker nodes or through the jobtracker web GUI. -Joey On Sep 26, 2011, at 19:47, Arko Provo Mukherjee wrote: > Hi, > > I am writing some Map Reduce programs in pseudo-distributed mode. > > I am getting some

Re: Re: Re: Re: The method addCacheFIle(URI) is undefined for the type Job

2011-09-24 Thread Joey Echeverria
Doesn't look like it to me: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/Job.html 2011/9/23 谭军 : > Joey Echeverria, > Yes, that works. > I thought job.addCacheFile(new URI(args[0])); could run on hadoop-0.20.2. > Because hadoop-0.20.2 could r

Re: Re: Re: The method addCacheFIle(URI) is undefined for the type Job

2011-09-23 Thread Joey Echeverria
> Hi Joey Echeverria, > My hadoop version is 0.20.2 > > -- > > Regards! > > Jun Tan > > 在 2011-09-24 11:36:08,"Joey Echeverria" 写道: >>Which version of Hadoop are you using? >> >>2011/9/23 谭军 : >>> Harsh, >>> It is java.net.URI

Re: Re: The method addCacheFIle(URI) is undefined for the type Job

2011-09-23 Thread Joey Echeverria
Which version of Hadoop are you using? 2011/9/23 谭军 : > Harsh, > It is java.net.URI that is imported. > > -- > > Regards! > > Jun Tan > > At 2011-09-24 00:52:14,"Harsh J" wrote: >>Jun, >> >>Common cause is that your URI class is not the right import. >> >>It must be java.net.URI and not any other

Re: FairScheduler Local Task Restriction

2011-09-22 Thread Joey Echeverria
Do you have assign multiple enabled in the fair scheduler? Event that may not be able to keep up if the tasks are only taking 10 seconds. Any way you could run the job with less splits? On Thu, Sep 22, 2011 at 1:21 PM, Adam Shook wrote: > Okay, I put a Thread.sleep to test my theory and it will r

Re: Regarding FIFO scheduler

2011-09-22 Thread Joey Echeverria
#x27;Fraction of the number of maps in the job which should be complete > before reduces are scheduled for the job.' > > Shouldn't the map tasks be completed before the reduce tasks are kicked for > a particular job? > > Praveen > > On Thu, Sep 22, 2011 at 6:53 P

Re: Regarding FIFO scheduler

2011-09-22 Thread Joey Echeverria
The jobs would run in parallel since J1 doesn't use all of your map tasks. Things get more interesting with reduce slots. If J1 is an overall slower job, and you haven't configured mapred.reduce.slowstart.completed.maps, then J1 could launch a bunch of idle reduce tasks which would starve J2. In g

Re: 7 of 487 New window Print all Can JobConf object be declared global and used in mapper and reducer class

2011-09-21 Thread Joey Echeverria
The map and reduce functions are running a different JVM, so they never ran the main() method. You can implement a configure(JobConf job) method in your map and reduce classes which will be passed the JobConf you used to launch the job. -Joey On Wed, Sep 21, 2011 at 9:36 AM, pranjal shrivastava

Re: Any good gui tools for working with hdfs

2011-09-20 Thread Joey Echeverria
Have you looked at hue (https://github.com/cloudera/hue)? It has a web-based GUI file manager. -Joey On Tue, Sep 20, 2011 at 6:50 PM, Steve Lewis wrote: > My dfs is a real mess and I am looking for a good gui fiile manager to allow > me to clean it up > deleting a lot of directories > Anyone wri

Re: Submitting Jobs from different user to a queue in capacity scheduler

2011-09-19 Thread Joey Echeverria
FYI, I'm moving this to mapreduce-user@ and bccing common-user@. It looks like your latest permission problem is on the local disk. What is your setting for hadoop.tmp.dir? What are the permissions on that directory? -Joey On Sep 18, 2011, at 23:27, ArunKumar wrote: > Hi guys ! > > Commo

Re: Debugging Mapreduce programs

2011-09-15 Thread Joey Echeverria
You can also use mrunit [1] to write unit tests against your MapReduce code. -Joey [1] http://incubator.apache.org/mrunit/ On Thu, Sep 15, 2011 at 1:18 AM, Subroto Sanyal wrote: > ** > > Hi, > > ** ** > > MapReduce framework provide different in built approaches to debug a Job:* > *** > > *

Re: -libjars?

2011-09-15 Thread Joey Echeverria
> before starting a cluster on EC2. :) > > Thanks for your time > > Marco > > On 14 September 2011 14:04, Joey Echeverria wrote: >> When are you getting the exception? Is it during the setup of your >> job, or after it's running on the cluster? >> &

Re: Has anyone ever written a file system where the data is held in resources

2011-09-14 Thread Joey Echeverria
To add to what Kevin said, you'll be writing a class that extends FileSystem. -Joey On Wed, Sep 14, 2011 at 1:08 PM, Kevin Burton wrote: > You would probably have to implement your own Hadoop filesystem similar to > S3 and KFS integrate. > I looked at it a while back and it didn't seem insanely

Re: -libjars?

2011-09-14 Thread Joey Echeverria
When are you getting the exception? Is it during the setup of your job, or after it's running on the cluster? -Joey On Wed, Sep 14, 2011 at 4:50 AM, Marco Didonna wrote: > Hello everyone, > sorry to bring this up again but I need some clarification. I wrote a > map-reduce application that need c

Re: How to Create an effective chained MapReduce program.

2011-09-05 Thread Joey Echeverria
> I tried it but it creates a binary file which i can not understand (i >>>> need the result of the first job). >>>> The other thing is how can i use this file in the next chained mapper? >>>> i.e how can i retrieve the keys and the values in the map function? >>&g

Re: How to Create an effective chained MapReduce program.

2011-09-05 Thread Joey Echeverria
Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? -Joey On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin wrote: > Hi > I'm trying to write a chained mapreduce program. i'm doing so with a simple > loop where in each iteration i > create a job ,execute it and every time the current

Re: how to write a Join in c++ with mapreduce?

2011-09-05 Thread Joey Echeverria
Have you looked at hadoop pipes? On Sep 5, 2011 6:01 AM, "seven garfee" wrote: > I see there is a java version,but for some reasons,I need a c++ version. > Anyone can help?

Re: Wonky reduce progress

2011-08-19 Thread Joey Echeverria
aken on compressed data instead of original data. > > -Original Message- > From: Joey Echeverria [mailto:j...@cloudera.com] > Sent: Friday, August 19, 2011 3:07 AM > To: d...@hive.apache.org > Subject: Wonky reduce progress > > I'm seeing really weird numbers in

Re: unique PATH environment variable for Mapper class

2011-08-08 Thread Joey Echeverria
You can set mapred.child.env on the JobConf before you submit the job. If you want to add to the PATH, you can set to something like: PATH=$PATH:/directory/with/dlls -Joey On Mon, Aug 8, 2011 at 5:04 PM, Curtis Jensen wrote: > Is it possible to set the MS. Windows PATH environment variable for

Re: How does Hadoop reuse the objects?

2011-08-04 Thread Joey Echeverria
Bhandarkar >> Greenplum Labs, EMC >> (Disclaimer: Opinions expressed in this email are those of the author, and >> do >> not necessarily represent the views of any organization, past or present, >> the author might be affiliated with.) >> >> >> >>

Re: How does Hadoop reuse the objects?

2011-08-03 Thread Joey Echeverria
Hadoop reuses objects as an optimization. If you need to keep a copy in memory, you need to call clone yourself. I've never used Avro, but my guess is that the BARs are not reused, only the FOO. -Joey On Wed, Aug 3, 2011 at 3:18 AM, Vyacheslav Zholudev wrote: > Hi all, > > I'm using Avro as a se

Re: Job tracker error

2011-07-24 Thread Joey Echeverria
You're running out of memory trying to generate the splits. You need to set a bigger heap for your driver program. Assuming you're using the hadoop jar command to launch your job, you can do this by setting HADOOP_HEAPSIZE to a larger value in $HADOOP_HOME/conf/hadoop-env.sh -Joey On Jul 24, 2011

Re: ArrayWritable Doesn't Write

2011-07-18 Thread Joey Echeverria
ArrayWritables can't be deserialized because they don't encode the type of the objects with the data. The solution is to sub-class ArrayWritable with your specific type. In your case, you'd need to do this: public class IntArraryWritable { public IntArrayWritable() { super(IntWritable.class

Re: Hadoop online upgrade

2011-07-08 Thread Joey Echeverria
It depends on the versions. Some minor updates are compatible with each other and you can do rolling restarts. If you're using less than 50% of your total storage, you could decommission half of your cluster, upgrade that half, distcp to the new cluster and then upgrade the other half. -Joey

Re: mapred.tasktracker.map.tasks.maximum is not taking into effect

2011-07-01 Thread Joey Echeverria
This property applies to a tasktraker rather that an individual job. Therefore it needs to be set in the mapred-site.xml and the daemon restarted. -Joey On Jul 1, 2011 7:01 PM, wrote: > Are you sure? AFAIK all mapred.xxx properties can be set via job config. I also read on yahoo tutorial that thi

Re: how to parse a sequence file in my local filesystem

2011-06-28 Thread Joey Echeverria
ile is on my disk(for example: D://test.seq), > and how to  write a java class to parse it? > > 2011/6/27 Joey Echeverria >> >> If the data is text you can always print out the sequence file using >> this command: >> >> hadoop fs -text file:///my/directory

Re: how to parse a sequence file in my local filesystem

2011-06-27 Thread Joey Echeverria
If the data is text you can always print out the sequence file using this command: hadoop fs -text file:///my/directory/file.seq This will parse the sequence file, convert each key and value to a string and print it to stdout. Notice the file:// in the path, that will cause hadoop to access the l

Re: Parallelize a workflow using mapReduce

2011-06-23 Thread Joey Echeverria
> Now, in case that this input file is not split based on HDFS block but > one-split per file. I will have in consequence only 1 mapper since I have > only 1 input split. Where the computation of the mapper takes place? in > machineA or machineB or machine C or in another machine inside the cluster

Re: hdfs reformat confirmation message

2011-06-22 Thread Joey Echeverria
You could pipe 'yes' to the hadoop command: yes | hadoop namenode -format -Joey On Wed, Jun 22, 2011 at 4:46 PM, Virajith Jalaparti wrote: > Hi, > > When I try to reformat HDFS (I have to multiple times for some experiment I > need to run), it asks for a confirmation Y/N. Is there a way to disa

Re: tasktracker maximum map tasks for a certain job

2011-06-21 Thread Joey Echeverria
The only way to do that is to drop the setting down to one and bounce the TaskTrackers. -Joey On Tue, Jun 21, 2011 at 12:52 PM, Jonathan Zukerman wrote: > Hi, > Is there a way to set the maximum map tasks for all tasktrackers in my > cluster for a certain job? > Most of my tasktrackers are confi

Re: How discard reduce results in mapreduce?

2011-06-02 Thread Joey Echeverria
Set your output format to the NullOutputFormat: job.setOutputFormat(NullOutputFormat.class); -Joey On Jun 2, 2011, at 6:21, Pedro Costa wrote: > What I meant in this question is put the processed result of the > reduce task in something like /dev/null. How can I do that? > > On Thu, Jun 2, 2

Re: Query regarding internal/working of hadoop fs -copyFromLocal and fs.write()

2011-05-31 Thread Joey Echeverria
They write directly to HDFS, there's no additional buffering on the local file system of the client. -Joey On Tue, May 31, 2011 at 7:56 PM, Mapred Learn wrote: > Hi guys, > I asked this question earlier but did not get any response. So, posting > again. Hope somebody can point to the right descr

Re: Log files expanding at an alarming rate

2011-05-23 Thread Joey Echeverria
Hi Karthik, FYI, I'm moving this thread to mapreduce-user@hadoop.apache.org (You and common-user are BCCed). My guess is that your task trackers are throwing a lot of exceptions which are getting logged. Can you send a snippet of the logs to help diagnose why it's logging so much? Can you also le

Re: get name of file in mapper output directory

2011-05-23 Thread Joey Echeverria
Hi Mark, FYI, I'm moving the discussion over to mapreduce-user@hadoop.apache.org since your question is specific to MapReduce. You can derive the output name from the TaskAttemptID which you can get by calling getTaskAttemptID() on the context passed to your cleanup() funciton. The task attempt i

Re: Mapping one key per Map Task

2011-05-23 Thread Joey Echeverria
Look at getInputSplits() of SequenceFileInputFormat. -Joey On May 23, 2011 5:09 AM, "Vincent Xue" wrote: > Hello Hadoop Users, > > I would like to know if anyone has ever tried splitting an input > sequence file by key instead of by size. I know that this is unusual > for the map reduce paradigm

Re: Running M/R jobs from java code

2011-05-18 Thread Joey Echeverria
Just last week I worked on a REST interface hosted in Tomcat that launched a MR job. In my case, I included the jar with the job in the WAR and called the run() method (the job implemented Tool). The only tricky part is a copy of the Hadoop configuration files needed to be in the classpath, but I j

Re: Improve data locality for MR job processing tar.gz files

2011-05-09 Thread Joey Echeverria
You could write your own input format class to handle breaking out the tar files for you. If you subclass FileInputFormat, Hadoop will handle decompressing the files because of the .gz file extension. Your input format would just need to use a Java tar file library (e.g. http://code.google.com/p/jt

Re: Multiple Outputs Not Being Written to File

2011-05-06 Thread Joey Echeverria
You need to add a call to MultipleOutputs.close() in your reducer's cleanup: public void cleanup(Context) throws IOException { mos.close(); ... } On Fri, May 6, 2011 at 1:55 PM, Geoffry Roberts wrote: > All, > > I am attempting to take a large file and split it up into a series of > smal

Re: How hadoop parse input files into (Key,Value) pairs ??

2011-05-05 Thread Joey Echeverria
Hadoop uses an InputFormat class to parse files and generate key, value pairs for your Mapper. An InputFormat is any class which extends the base abstract class: http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/InputFormat.html The default InputFormat parse text files

Re: Hadoop Mapreduce jobs and LD_LIBRARY_PATH

2011-04-29 Thread Joey Echeverria
Just to confirm, you restarted hadoop after making the changes to mapred-site.xml? -Joey On Fri, Apr 29, 2011 at 11:53 AM, Donatella Firmani wrote: > Hi Alex, > > I'm just editing mapred-site.xml in /conf directory of my hadoop > installation root. > I'm running in pseudo-distributed mode? > > S

Re: Mappers crashing due to running out of heap space during initialisation

2011-04-27 Thread Joey Echeverria
It was initializing a 200MB buffer to do the sorting of the output in. How much space did you allocate the task JVMs (mapred.child.java.opts in mapred-site.xml)? If you didn't change the default, it's set to 200MB which is why you would run out of error trying to allocate a 200MB buffer. -Joey O