I don't have any experience with MongoDB, but just gave my 2 cents here.
Your code is not efficient, as using the "+=" on String, and you could have 
reused the Text object in your mapper, as it is a mutable class, to be reused 
and avoid creating it again and again like "new Text()" in the mapper. My guess 
that BSONWritable should be a similar mutable class, if it aims to be used like 
the rest Writable Hadoop class.
But even like that, it should just make your mapper run slower, as a lot of 
objects need to be GC, instead of OOM.
When you claim  96G ram, I am not sure what do you mean? From what you said, it 
failed in mapper stage, so let's focus on mapper. What max heap size you gave 
to the mapper task? I don't think 96G is the setting you mean to give to each 
mapper task. Otherwise, the only place I can think is that there are millions 
of Strings to be appended in one record by "+=" and cause the OOM.
You need to answer the following questions by yourself:
1) Are there any mappers successful?2) The OOM mapper, is it always on the same 
block? If so, you need to dig into the source data for that block, to think why 
it will cause OOM.3) Did you give reasonable heap size for the mapper? What it 
is?
Yong
From: blanca.hernan...@willhaben.at
To: user@hadoop.apache.org
Subject: Extremely amount of memory and DB connections by MR Job
Date: Mon, 29 Sep 2014 12:57:41 +0000









Hi, 
 
I am using a hadoop map reduce job + mongoDb.

It goes against a data base 252Gb big. During the job the amount of conexions 
is over 8000 and we gave already 9Gb RAM. The job is still crashing because of 
a OutOfMemory with only a 8% of the mapping done.
Are this numbers normal? Or did we miss something regarding configuration?
I attach my code, just in case the problem is with it.

 
Mapper:
 
public class AveragePriceMapper extends Mapper<Object, BasicDBObject, Text, 
BSONWritable> {
    @Override
    public void map(final Object key, final BasicDBObject val, final Context 
context) throws IOException, InterruptedException {
        String id = "";
        for(String propertyId : currentId.split(AveragePriceGlobal.SEPARATOR)){
            id += val.get(propertyId) + AveragePriceGlobal.SEPARATOR;
        }
        BSONWritable bsonWritable = new BSONWritable(val);
        context.write(new Text(id), bsonWritable);
    }
}
 
 
Reducer:
public class AveragePriceReducer extends Reducer<Text, BSONWritable, Text, 
Text>  {
    public void reduce(final Text pKey, final Iterable<BSONWritable> pValues, 
final Context pContext) throws IOException, InterruptedException {
        while(pValues.iterator().hasNext() && continueLoop){
            BSONWritable next = pValues.iterator().next();
            //Make some calculations
        }        pContext.write(new Text(currentId), new Text(new 
MyClass(currentId, AveragePriceGlobal.COMMENT, 0, 0).toString()));
 
    }
}
 
The configuration includes a query which filters the number of objects to 
analyze (not the 252Gb will be analyzed).
 
Many thanks. Best regards, 
Blanca
                                          

Reply via email to