how does hadoop work?

2009-12-20 Thread Doopah Shaf
Trying to figure out how hadoop actually achieves its speed. Assuming that data locality is central to the efficiency of hadoop, how does the magic actually happen, given that data still gets moved all over the network to reach the reducers? For example, if I have 1gb of logs spread across 10 data

Re: Why I can only run 2 map/reduce task at a time?

2009-12-20 Thread Chandraprakash Bhagtani
You can increase the map/reduce slots using "mapred.tasktracker.map(reduce).tasks.maximum" property only there can be following cases 1. your changes are not taking effect. you need to restart the cluster after making changes in conf xml. you can check your cluster (Map/Reduce) capacity at h

Re: Why I can only run 2 map/reduce task at a time?

2009-12-20 Thread himanshu chandola
Try using setNumMapTasks in JobConf. Though it is only a hint to the framework and doesn't guarantee the number of jobs. Morpheus: Do you believe in fate, Neo? Neo: No. Morpheus: Why Not? Neo: Because I don't like the idea that I'm not in control of my life. - Original Message From:

Re: Show error while accessing data using hive

2009-12-20 Thread Chandraprakash Bhagtani
Check, your NN and JT are running and you can ssh to the localhost without password. On Mon, Dec 21, 2009 at 10:34 AM, Mohan Agarwal wrote: > Hi, > I have installed hadoop-0.19.2 on my system in a pseudo-distributed > mode and I am using hive to access data. When I am trying to access data b

Show error while accessing data using hive

2009-12-20 Thread Mohan Agarwal
Hi, I have installed hadoop-0.19.2 on my system in a pseudo-distributed mode and I am using hive to access data. When I am trying to access data by executing a query (i. g. select * from *"table_name*" ) on hive , it is giving me a following error. *FAILED: Unknown exception : java.net.Conn

Re: Help with fuse-dfs

2009-12-20 Thread Eli Collins
> fuse_dfs TRACE - readdir / >   unique: 4, success, outsize: 200 > unique: 5, opcode: RELEASEDIR (29), nodeid: 1, insize: 64 >   unique: 5, success, outsize: 16 > > Does It seem OK? Hm, seems like it's not finding any directory entries. Mind putting a printf in dfs_readdir after hdfsListDirectori

Why I can only run 2 map/reduce task at a time?

2009-12-20 Thread Starry SHI
Hi, I am currently using hadoop 0.19.2 to run large data processing. But I noticed when the job is launched, there are only two map/reduce tasks running in the very beginning. after one heartbeat (5sec), another two map/reduce task is started. I want to ask how I can increase the map/reduce slots?

Re: More access to nodes in a distributed cache

2009-12-20 Thread Ahmad Ali Iqbal
Can someone please shed light on this issue. Thanks a lot, -- Ahmad On Thu, Dec 17, 2009 at 2:39 PM, Ahmad Ali Iqbal wrote: > Hi Mike, > > My understanding is, in hadoop job scheduling is done implicitly as you > said it spread load as much as possible. However, I want to control task > assig

File Split

2009-12-20 Thread Cao Kang
Hi, I have spent several days on the customized file input format in hadoop. Basically, we need split one giant square shaped image (.tif) into four square shaped smaller images. Where does the really split happen? Should I implement it in "getSplits" function or in the "next" function? It's quite

Re: general question - how hadoop works

2009-12-20 Thread Todd Lipcon
Hi Doopa, In large multi-rack clusters, the network can become saturated in jobs like sort. Hadoop does a few things to try to ameliorate the issue: - The reducers start copying map output data before the map tasks complete. Thus, the time spent copying is concurrent with the time spent processin

general question - how hadoop works

2009-12-20 Thread doopha shaf
Trying to figure out how hadoop actually achieves its speed. Assuming that data locality is central to the efficiency of hadoop, how does the magic actually happen, given that data still gets moved all over the network to reach the reducers? For example, if I have 1gb of logs spread across 10 da