I assume you know the tradeoff here: If you do depend upon mapper slot # in
your implementation to speed it up, you are losing on code portability in
long term
That said, one way to achieve this is to use the JobConf API:
int partition = jobConf.getInt(JobContext.TASK_PARTITION, -1);
The
I am assuming you have looked at this already:
https://issues.apache.org/jira/browse/MAPREDUCE-5186
You do have a workaround here to increase *mapreduce.job.max.split.locations
*value in hive configuration, or do we need more than that here ?
-Rahul
On Thu, Sep 19, 2013 at 11:00 AM, Murtaza
of data, then a different day I want to run
against 30 days?
On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain rja...@gmail.com wrote:
I am assuming you have looked at this already:
https://issues.apache.org/jira/browse/MAPREDUCE-5186
You do have a workaround here to increase
Which version of hadoop are you using ? MRV1 or MRV2 (yarn) ??
For MRv2 (yarn): you can pretty much achieve this using:
yarn.nodemanager.resource.memory-mb (system wide setting)
and
mapreduce.map.memory.mb (job level setting)
e.g. if yarn.nodemanager.resource.memory-mb=100
and
Check your node manager logs to understand the bottleneck first. When we
had a similar issue on recent version of hadoop, which includes fix for
MAPREDUCE-4068: we rearranged our job jar file to reduce time spent on
'expanding' the job jar file by the node manager(s).
-Rahul
On Sun, Jan 20, 2013
The inability to look at map-reduce logs for failed logs is due to number
of open issues in yarn; see my recent comment here:
https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995
You
, Rahul Jain rja...@gmail.com wrote:
The inability to look at map-reduce logs for failed logs is due to
number
of open issues in yarn; see my recent comment here:
https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995page
I am assuming you read thru:
https://cwiki.apache.org/Hive/hiveserver.html
The server comes up on port 10,000 by default, did you verify that it is
actually listening on the port ? You can also connect to hive server using
web browser to confirm its status.
-Rahul
On Mon, Apr 16, 2012 at 1:53
You might be suffering from HADOOP-7822; I'd suggest you verify your pid
files and fix the problem by hand if it is the same issue.
-Rahul
On Fri, Dec 16, 2011 at 2:40 PM, Joey Krabacher jkrabac...@gmail.comwrote:
Turns out my tasktrackers(on the datanodes) are not starting properly
so I
The easy way to debug such problems in our experience is to use 'jmap' to
take a few snapshots of one of the tasktrackers (child tasks) and analyze
them under a profiler tool such as jprofiler, yourkit etc. This should give
you pretty good indication of objects that are using up most heap memory.
Your latter statement is correct:
if the output of the Map1 phase (or Reduce phase) is immediately inserted
to Map2 phase (or Map3 Phase) within the same node, without any
distribution.
ChainMappers / ChainReducers are just convenience classes to allow reuse of
mapper code whether executing as
If you google for such memory failures, you'll find the mapreduce tunable
that'll help you:
mapred.job.shuffle.input.buffer.percent ; it is well known that the default
values in hadoop config
don't work well for large data systems
-Rahul
On Wed, Feb 16, 2011 at 10:36 AM, James Seigel
Also make sure you've enough input files for the next stage mappers to work
with...
Read thru the input splits part of tutorial:
http://wiki.apache.org/hadoop/HadoopMapReduce
If the last stage had only 4 reducers running, they'd generate 4 output
files. This will limit the # of mappers started
In case the producer / consumer don't require sorting to happen, take a look
at ChainMapper:
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/ChainMapper.html
If you do want the stuff to happen after sorting, take a look at:
Hadoop does not prevent you from writing key value pair multiple times in
the same map iteration if that is what is your roadblock.
You can call collector.collect() multiple times with same or distinct key
value pairs within a single map iteration.
-Rahul
On Thu, Jul 29, 2010 at 8:10 AM,
I am not sure why you are using getFileClassPaths() API to access files...
here is what works for us:
Add the file(s) to distributed cache using:
DistributedCache.addCacheFile(p.toUri(), conf);
Read the files on the mapper using:
URI[] uris = DistributedCache.getCacheFiles(conf);
// access one
());
if(hdfs.exists(my_path))
{
FSDataInputStreamfs=hdfs.open(my_path);
while((str=fs.readLine())!=null)
System.out.println(str);
}
Thanks
From: Rahul Jain rja
There are two issues which were fixed in 0.21.0 and can cause job tracker
to run out of memory:
https://issues.apache.org/jira/browse/MAPREDUCE-1316
and
https://issues.apache.org/jira/browse/MAPREDUCE-841
We've been hit by MAPREDUCE-841 (large jobConf objects with large number of
tasks,
18 matches
Mail list logo