too-many fetch failures

2008-10-01 Thread chandra
hi all... i'm using hadoop-17.2 version.. i'm tryin to cluster 3 nodes. when i run applns, map tasks gets over and reduce tasks just halts.. And i get this Too-many fetch failures.. [EMAIL PROTECTED] hadoop-0.17.2.1]# bin/hadoop jar word/word.jar org.myorg.WordCount input output2 08/10/01

architecture diagram

2008-10-01 Thread Terrence A. Pietrondi
I am trying to plan out my map-reduce implementation and I have some questions of where computation should be split in order to take advantage of the distributed nodes. Looking at the architecture diagram (http://hadoop.apache.org/core/images/architecture.gif), are the map boxes the major

Re: architecture diagram

2008-10-01 Thread Alex Loddengaard
Hi Terrence, It really depends on your job I think. Often reduce steps can be the bottleneck if you want a single output file (one reducer). Hope this helps. Alex On Wed, Oct 1, 2008 at 10:17 AM, Terrence A. Pietrondi [EMAIL PROTECTED] wrote: I am trying to plan out my map-reduce

Re: architecture diagram

2008-10-01 Thread Tim Wintle
I normally find the intermediate stage of copying data to the reducers from the mappers to be a significant step - but that's not over the best quality switches... The mappers and reducers work on the same boxes, close to the data. On Wed, 2008-10-01 at 10:59 -0700, Alex Loddengaard wrote:

Re: architecture diagram

2008-10-01 Thread Arun C Murthy
On Oct 1, 2008, at 10:17 AM, Terrence A. Pietrondi wrote: I am trying to plan out my map-reduce implementation and I have some questions of where computation should be split in order to take advantage of the distributed nodes. Looking at the architecture diagram

Re: Merging of the local FS files threw an exception

2008-10-01 Thread Arun C Murthy
On Oct 1, 2008, at 11:07 AM, Per Jacobsson wrote: I ran a job last night with Hadoop 0.18.0 on EC2, using the standard small AMI. The job was producing gzipped output, otherwise I haven't changed the configuration. The final reduce steps failed with this error that I haven't seem before:

Re: Merging of the local FS files threw an exception

2008-10-01 Thread Per Jacobsson
I've collected the syslogs from the failed reduce jobs. What's the best way to get them to you? Let me know if you need anything else, I'll have to shut down these instances some time later today. Overall I've run this same job before with no problems. The only change is the added gzip of the

Re: LZO and native hadoop libraries

2008-10-01 Thread Nathan Marz
Yes, this is exactly what I'm seeing. To be honest, I don't know which LZO native library it should be looking for. The LZO install dropped liblzo2.la and liblzo2.a in my /usr/local/lib directory, but not a file with a .so extension. Hardcoding would be fine as a temporary solution, but I

Re: architecture diagram

2008-10-01 Thread Terrence A. Pietrondi
So to be distributed in a sense, you would want to do your computation on the disconnected parts of data in the map phase I would guess? Terrence A. Pietrondi http://del.icio.us/tepietrondi --- On Wed, 10/1/08, Arun C Murthy [EMAIL PROTECTED] wrote: From: Arun C Murthy [EMAIL PROTECTED]

Re: LZO and native hadoop libraries

2008-10-01 Thread Arun C Murthy
On Oct 1, 2008, at 12:54 PM, Nathan Marz wrote: Yes, this is exactly what I'm seeing. To be honest, I don't know which LZO native library it should be looking for. The LZO install dropped liblzo2.la and liblzo2.a in my /usr/local/lib directory, but not a file with a .so extension.

Re: Merging of the local FS files threw an exception

2008-10-01 Thread Arun C Murthy
On Oct 1, 2008, at 12:04 PM, Per Jacobsson wrote: I've collected the syslogs from the failed reduce jobs. What's the best way to get them to you? Let me know if you need anything else, I'll have to shut down these instances some time later today. Could you please attach them to the

Re: Merging of the local FS files threw an exception

2008-10-01 Thread Per Jacobsson
Attached to the ticket. Hope this helps. / Per On Wed, Oct 1, 2008 at 1:33 PM, Arun C Murthy [EMAIL PROTECTED] wrote: On Oct 1, 2008, at 12:04 PM, Per Jacobsson wrote: I've collected the syslogs from the failed reduce jobs. What's the best way to get them to you? Let me know if you need

Re: architecture diagram

2008-10-01 Thread Alex Loddengaard
I'm not sure what you mean by disconnected parts of data, but Hadoop is implemented to try and perform map tasks on machines that have input data. This is to lower the amount of network traffic, hence making the entire job run faster. Hadoop does all this for you under the hood. From a user's

Is there a way to pause a running hadoop job?

2008-10-01 Thread Steve Gao
I have 5 running jobs, each has 2 reducers. Because I set max number of reducers as 10 so any incoming job will be hold until some of the 5 jobs finish and release reducer quota. Now the problem is that an incoming job has a higher priority that I want to pause some of the 5 jobs, let the new

RE: Hive questions about the meta db

2008-10-01 Thread Ashish Thusoo
Hi Edward, You can have multiple instances of hive by pointing the hive cli to different configs (This is very similar to the hadoop model). Take a look at hive-default.xml in you hive instance. You can create different copies of this file and change the following properties:

Re: Hive questions about the meta db

2008-10-01 Thread Prasad Chakka
Hi Edward, By default, the embedded version of apache derby database is used as a metadb. You can run multiple queries against same metadb by providing a jdbc connection (where the metadata is located) to a mysql/derby or any other relational database in the options

default splits

2008-10-01 Thread chandra
hi all... i have doubt.. If we dont specify numSplits in getsplits(), then what is the default number of splits taken... -- Best Regards S.Chandravadana This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and

Questions on dfs.datanode.du.reserved

2008-10-01 Thread Taeho Kang
Dear All, I have few questions on dfs.datanode.du.reserved property in hadoop-site.xml configuration... Assuming that I have dfs.datanode.du.reserved = 10GB and the partition assigned for HDFS has already been filled up to its capacity. (In this case, it will be Total disk size minus 10GB) What