how to use two reduce fucntions?

2008-02-22 Thread ma qiang
Hi all, I have a program need to use two reduce fucntions, who can tell me why? Thank you! Qiang

Re: How to split the hdfs in different subgroups

2008-02-22 Thread Raghu Angadi
[EMAIL PROTECTED] wrote: I read the docs about rack awareness but my issue is how the client can pick some specific datanodes, which are located in some specific rack, to write the block there. The idea is that the client is able to write the block in two separated groups of datanodes in the same

Re: Problems with NFS share in dfs.name.dir

2008-02-22 Thread Raghu Angadi
Check the namenode log. It is possible that your NFS mount has problems NameNode might be stuck trying to write to it. If log is not useful, you can attach jstack output for NameNode when it seems to be stuck. Raghu. Nathan Wang wrote: Hi, We're having problems when trying to deal with t

Problems with NFS share in dfs.name.dir

2008-02-22 Thread Nathan Wang
Hi, We're having problems when trying to deal with the namenode failover, by following the wiki http://wiki.apache.org/hadoop/NameNodeFailover If we point dfs.name.dir to 2 local directories, it works fine. But, if one of the directories is NFS mounted, we're having these problems: 1) "hadoo

Re: Problems running a HOD test cluster

2008-02-22 Thread Allen Wittenauer
On 2/22/08 3:58 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote: > We have been unable to get torque up and running. The magic value in the > server_name file seems to elude us. The server_name should be the real hostname of the machine running pbs_server. > We have tried localhost, 127.0.0.1, m

RE: Calculations involve large datasets

2008-02-22 Thread Runping Qi
There is a package for joining data from multiple sources: contrib/data-join. It implements the basic joining logic and allows the user to provide application specific logic for filtering/projecting and combining multiple records into one. Runping > -Original Message- > From: Ted Dun

RE: Hadoop summit / workshop at Yahoo!

2008-02-22 Thread xavier.quintuna
I agree, I love to be part of this but the rooms are full. Xavier -Original Message- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent: Friday, February 22, 2008 11:04 AM To: core-user@hadoop.apache.org Subject: Re: Hadoop summit / workshop at Yahoo! Puhh, 2 days and it is full? Does

RE: How to split the hdfs in different subgroups

2008-02-22 Thread xavier.quintuna
I read the docs about rack awareness but my issue is how the client can pick some specific datanodes, which are located in some specific rack, to write the block there. The idea is that the client is able to write the block in two separated groups of datanodes in the same hdfs. For instance: bin/ha

Re: Problems running a HOD test cluster

2008-02-22 Thread Jason Venner
We have been unable to get torque up and running. The magic value in the server_name file seems to elude us. We have tried localhost, 127.0.0.1, machine name, machine ip, fq machine name. Depending on what we use, we either get Unauthorized request or invalid entry qmgr obj= svr=default: Bad ACL

Re: Calculations involve large datasets

2008-02-22 Thread Ted Dunning
Joins are easy. Just reduce on a key composed of the stuff you want to join on. If the data you are joining is disparate, leave some kind of hint about what kind of record you have. The reducer will be iterating through sets of records that have the same key. This is similar to the results of

RE: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread dhruba Borthakur
If your file system metadata is in /tmp, then you are likely to see these kinds of problems. It would be nice if you can move the location of your metadata files away from /tmp. If you still see the problem, can you pl send us the logs from the log directory? Thanks a bunch, Dhruba -Original

Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread Steve Sapovits
Raghu Angadi wrote: Please report such problems if you think it was because of HDFS, as opposed to some hardware or disk failures. Will do. I suspect it's something else. I'm testing on a notebook in pseudo-distributed mode (per the quick start guide). My IP changes when I take that box be

Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread Raghu Angadi
Please report such problems if you think it was because of HDFS, as opposed to some hardware or disk failures. Raghu. Steve Sapovits wrote: dhruba Borthakur wrote: Reformatting should never be necessary if you are using released version of hadoop. Hadoop-2783 refers to a bug that got intro

Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread Steve Sapovits
dhruba Borthakur wrote: Reformatting should never be necessary if you are using released version of hadoop. Hadoop-2783 refers to a bug that got introduced into trunk (not in any released versions). Interesting. We're running only released versions. We have cases where the name node won't co

RE: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread dhruba Borthakur
Reformatting should never be necessary if you are using released version of hadoop. Hadoop-2783 refers to a bug that got introduced into trunk (not in any released versions). Thanks, Dhruba -Original Message- From: Steve Sapovits [mailto:[EMAIL PROTECTED] Sent: Friday, February 22, 2008

Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread Steve Sapovits
What are the situations that make reformatting necessary? Testing, we seem to hit a lot of cases where we have to reformat. We're wondering how much of a real production issue this is. -- Steve Sapovits Invite Media - http://www.invitemedia.com [EMAIL PROTECTED]

Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread Konstantin Shvachko
André, You can try to rollback. You did use upgrade when you switched to the new trunk, right? --Konstantin Raghu Angadi wrote: André Martin wrote: Hi Raghu, done: https://issues.apache.org/jira/browse/HADOOP-2873 Subsequent tries did not succeed - so it looks like I need to re-format the clu

Re: Hadoop summit / workshop at Yahoo!

2008-02-22 Thread Bradford Stephens
Yes, I was really looking forward to attending. :) On Fri, Feb 22, 2008 at 11:04 AM, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Puhh, 2 days and it is full? > Does Yahoo have no bigger rooms than just for a 100 people? > > > > > > On Feb 20, 2008, at 12:10 PM, Ajay Anand wrote: > > > The reg

Re: Hadoop summit / workshop at Yahoo!

2008-02-22 Thread Stefan Groschupf
Puhh, 2 days and it is full? Does Yahoo have no bigger rooms than just for a 100 people? On Feb 20, 2008, at 12:10 PM, Ajay Anand wrote: The registration page for the Hadoop summit is now up: http://developer.yahoo.com/hadoop/summit/ Space is limited, so please sign up early if you are inter

Re: How to split the hdfs in different subgroups

2008-02-22 Thread Raghu Angadi
You could probably treat these two groups as different "racks". You can read about rackawareness in http://hadoop.apache.org/core/docs/r0.16.0/hdfs_user_guide.html , and follow the links from there for more information regd how to configure etc. Raghu. [EMAIL PROTECTED] wrote: Hi There, I

Re: Sorting output data on value

2008-02-22 Thread Doug Cutting
Tarandeep Singh wrote: but isn't the output of reduce step sorted ? No, the input of reduce is sorted by key. The output of reduce is generally produced as the input arrives, so is generally also sorted by key, but reducers can output whatever they like. Doug

Re: Calculations involve large datasets

2008-02-22 Thread Tim Wintle
Have you seen PIG: http://incubator.apache.org/pig/ It generates hadoop code and is more query like, and (as far as I remember) includes union, join, etc. Tim On Fri, 2008-02-22 at 09:13 -0800, Chuck Lan wrote: > Hi, > > I'm currently looking into how to better scale the performance of our > ca

Re: Calculations involve large datasets

2008-02-22 Thread Amar Kamat
See http://incubator.apache.org/pig/. Hope that helps. Not sure how joins could be done in Hadoop. Amar On Fri, 22 Feb 2008, Chuck Lan wrote: Hi, I'm currently looking into how to better scale the performance of our calculations involving large sets of financial data. It is currently using a

Re: Add your project or company to the powered by page?

2008-02-22 Thread Doug Cutting
I added this to the wiki. Doug Jimmy Lin wrote: University of Maryland http://www.umiacs.umd.edu/~jimmylin/cloud-computing/index.html We are one of six universities participating in IBM/Google's academic cloud computing initiative. Ongoing research and teaching efforts include projects in ma

Re: Add your project or company to the powered by page?

2008-02-22 Thread Jimmy Lin
University of Maryland http://www.umiacs.umd.edu/~jimmylin/cloud-computing/index.html We are one of six universities participating in IBM/Google's academic cloud computing initiative. Ongoing research and teaching efforts include projects in machine translation, language modeling, bioinformatic

Re: Problem with LibHDFS

2008-02-22 Thread Arun C Murthy
On Feb 21, 2008, at 3:29 AM, Raghavendra K wrote: Hi, I am able to get Hadoop running and also able to compile the libhdfs. But when I run the hdfs_test program it is giving Segmentation Fault. Unfortunately the documentation for using libhdfs is sparse, our apologies. You'll need to

Re: Problems running a HOD test cluster

2008-02-22 Thread Allen Wittenauer
On 2/21/08 10:52 AM, "Luca" <[EMAIL PROTECTED]> wrote: > A few questions: > - is Java6 ok for HOD? That's what we use. > - I have an externally running HDFS cluster, as specified in > [gridservice-hdfs]: how do I find out the fs_port of my cluster? IS it > something specified in the hadoop-si

RE: Questions regarding configuration parameters...

2008-02-22 Thread C G
Guys: Thanks for the information...I've gotten some pretty good results twiddling some parameters. I've also reminded myself about the pitfalls of oversubscribing resources (like number of reducers). Here's what I learned, written up here to hopefully help somebody later... I set u

Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread Raghu Angadi
From the jira: (for users similarly affected): Andre, as a temporary hack, you can just comment out the FSImage.java:749 and your restart should work, since these are last entries read from the image file. Raghu. Raghu Angadi wrote: André Martin wrote: Hi Raghu, done: https://is

Calculations involve large datasets

2008-02-22 Thread Chuck Lan
Hi, I'm currently looking into how to better scale the performance of our calculations involving large sets of financial data. It is currently using a series of Oracle SQL statements to perform the calculations. It seems to me that the MapReduce algorithm may work in this scenario. However, I b

Re: What to use instead of globPaths (deprecated)?

2008-02-22 Thread Josh Snyder
Sorry, I'm an idiot. Following the law that says one figures it out immediately on pestering others -- globStatus will do it. Thanks, Josh On 2/22/08, Josh Snyder <[EMAIL PROTECTED]> wrote: > Hi, > > In the current API documentation, FileSystem.globPaths is marked as > deprecated. However, I c

What to use instead of globPaths (deprecated)?

2008-02-22 Thread Josh Snyder
Hi, In the current API documentation, FileSystem.globPaths is marked as deprecated. However, I couldn't figure out what I could use in its place. What is the preferred alternative to globPaths? I'm new to this list and to Hadoop, so I apologize if this is obvious -- but grepping + skimming didn't

Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread Raghu Angadi
André Martin wrote: Hi Raghu, done: https://issues.apache.org/jira/browse/HADOOP-2873 Subsequent tries did not succeed - so it looks like I need to re-format the cluster :-( Please back up the log files and name node image files if you can before re-format. Raghu.

Re: Sorting output data on value

2008-02-22 Thread Tarandeep Singh
On Fri, Feb 22, 2008 at 5:46 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Feb 21, 2008, at 11:01 PM, Ted Dunning wrote: > > > > > But this only guarantees that the results will be sorted within each > > reducers input. Thus, this won't result in getting the results > > sorted by > > t

Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread André Martin
Hi Raghu, done: https://issues.apache.org/jira/browse/HADOOP-2873 Subsequent tries did not succeed - so it looks like I need to re-format the cluster :-( Cu on the 'net, Bye - bye, < André èrbnA > Raghu Angadi wrote: P

Re: Sorting output data on value

2008-02-22 Thread Owen O'Malley
On Feb 21, 2008, at 11:01 PM, Ted Dunning wrote: But this only guarantees that the results will be sorted within each reducers input. Thus, this won't result in getting the results sorted by the reducers output value. I thought the question was how to get the values sorted within a call

RE: Questions regarding configuration parameters...

2008-02-22 Thread Tim Wintle
I have had exactly the same problem with using the command line to cat files - they can take for ages, although I don't know why. Network utilisation does not seem to be the bottleneck, though. (Running 0.15.3) Is the slow part of the reduce while you are waiting for the map data to copy over to