Trying to figure out how hadoop actually achieves its speed. Assuming that
data locality is central to the efficiency of hadoop, how does the magic
actually happen, given that data still gets moved all over the network to
reach the reducers?
For example, if I have 1gb of logs spread across 10 data
You can increase the map/reduce slots
using "mapred.tasktracker.map(reduce).tasks.maximum" property only
there can be following cases
1. your changes are not taking effect. you need to restart the cluster after
making changes in conf xml.
you can check your cluster (Map/Reduce) capacity at
h
Try using setNumMapTasks in JobConf.
Though it is only a hint to the framework and doesn't guarantee the number of
jobs.
Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.
- Original Message
From:
Check, your NN and JT are running and you can ssh to the localhost without
password.
On Mon, Dec 21, 2009 at 10:34 AM, Mohan Agarwal
wrote:
> Hi,
> I have installed hadoop-0.19.2 on my system in a pseudo-distributed
> mode and I am using hive to access data. When I am trying to access data b
Hi,
I have installed hadoop-0.19.2 on my system in a pseudo-distributed
mode and I am using hive to access data. When I am trying to access data by
executing a query (i. g. select * from *"table_name*" ) on hive , it is
giving me a following error.
*FAILED: Unknown exception : java.net.Conn
> fuse_dfs TRACE - readdir /
> unique: 4, success, outsize: 200
> unique: 5, opcode: RELEASEDIR (29), nodeid: 1, insize: 64
> unique: 5, success, outsize: 16
>
> Does It seem OK?
Hm, seems like it's not finding any directory entries. Mind putting a
printf in dfs_readdir after hdfsListDirectori
Hi,
I am currently using hadoop 0.19.2 to run large data processing. But I
noticed when the job is launched, there are only two map/reduce tasks
running in the very beginning. after one heartbeat (5sec), another two
map/reduce task is started. I want to ask how I can increase the map/reduce
slots?
Can someone please shed light on this issue.
Thanks a lot,
--
Ahmad
On Thu, Dec 17, 2009 at 2:39 PM, Ahmad Ali Iqbal
wrote:
> Hi Mike,
>
> My understanding is, in hadoop job scheduling is done implicitly as you
> said it spread load as much as possible. However, I want to control task
> assig
Hi,
I have spent several days on the customized file input format in hadoop.
Basically, we need split one giant square shaped image (.tif) into four
square shaped smaller images. Where does the really split happen? Should I
implement it in "getSplits" function or in the "next" function? It's quite
Hi Doopa,
In large multi-rack clusters, the network can become saturated in jobs like
sort. Hadoop does a few things to try to ameliorate the issue:
- The reducers start copying map output data before the map tasks complete.
Thus, the time spent copying is concurrent with the time spent processin
Trying to figure out how hadoop actually achieves its speed. Assuming that
data locality is central to the efficiency of hadoop, how does the magic
actually happen, given that data still gets moved all over the network to
reach the reducers?
For example, if I have 1gb of logs spread across 10 da
11 matches
Mail list logo