I have had exactly the same problem with using the command line to cat
files - they can take for ages, although I don't know why. Network
utilisation does not seem to be the bottleneck, though.
(Running 0.15.3)
Is the slow part of the reduce while you are waiting for the map data to
copy over to
On Feb 21, 2008, at 11:01 PM, Ted Dunning wrote:
But this only guarantees that the results will be sorted within each
reducers input. Thus, this won't result in getting the results
sorted by
the reducers output value.
I thought the question was how to get the values sorted within a call
Hi Raghu,
done: https://issues.apache.org/jira/browse/HADOOP-2873
Subsequent tries did not succeed - so it looks like I need to re-format
the cluster :-(
Cu on the 'net,
Bye - bye,
André èrbnA
Raghu Angadi wrote:
Please file a jira
On Fri, Feb 22, 2008 at 5:46 AM, Owen O'Malley [EMAIL PROTECTED] wrote:
On Feb 21, 2008, at 11:01 PM, Ted Dunning wrote:
But this only guarantees that the results will be sorted within each
reducers input. Thus, this won't result in getting the results
sorted by
the reducers
Hi,
In the current API documentation, FileSystem.globPaths is marked as
deprecated. However, I couldn't figure out what I could use in its
place. What is the preferred alternative to globPaths?
I'm new to this list and to Hadoop, so I apologize if this is obvious
-- but grepping + skimming
Sorry, I'm an idiot. Following the law that says one figures it out
immediately on pestering others -- globStatus will do it.
Thanks,
Josh
On 2/22/08, Josh Snyder [EMAIL PROTECTED] wrote:
Hi,
In the current API documentation, FileSystem.globPaths is marked as
deprecated. However, I
Hi,
I'm currently looking into how to better scale the performance of our
calculations involving large sets of financial data. It is currently using
a series of Oracle SQL statements to perform the calculations. It seems to
me that the MapReduce algorithm may work in this scenario. However, I
Guys:
Thanks for the information...I've gotten some pretty good results twiddling
some parameters. I've also reminded myself about the pitfalls of
oversubscribing resources (like number of reducers). Here's what I learned,
written up here to hopefully help somebody later...
I set
On Feb 21, 2008, at 3:29 AM, Raghavendra K wrote:
Hi,
I am able to get Hadoop running and also able to compile the
libhdfs.
But when I run the hdfs_test program it is giving Segmentation Fault.
Unfortunately the documentation for using libhdfs is sparse, our
apologies.
You'll need
See http://incubator.apache.org/pig/. Hope that helps. Not sure how joins
could be done in Hadoop.
Amar
On Fri, 22 Feb 2008, Chuck Lan wrote:
Hi,
I'm currently looking into how to better scale the performance of our
calculations involving large sets of financial data. It is currently using
a
Have you seen PIG:
http://incubator.apache.org/pig/
It generates hadoop code and is more query like, and (as far as I
remember) includes union, join, etc.
Tim
On Fri, 2008-02-22 at 09:13 -0800, Chuck Lan wrote:
Hi,
I'm currently looking into how to better scale the performance of our
Tarandeep Singh wrote:
but isn't the output of reduce step sorted ?
No, the input of reduce is sorted by key. The output of reduce is
generally produced as the input arrives, so is generally also sorted by
key, but reducers can output whatever they like.
Doug
You could probably treat these two groups as different racks. You can
read about rackawareness in
http://hadoop.apache.org/core/docs/r0.16.0/hdfs_user_guide.html , and
follow the links from there for more information regd how to configure etc.
Raghu.
[EMAIL PROTECTED] wrote:
Hi There,
I
Puhh, 2 days and it is full?
Does Yahoo have no bigger rooms than just for a 100 people?
On Feb 20, 2008, at 12:10 PM, Ajay Anand wrote:
The registration page for the Hadoop summit is now up:
http://developer.yahoo.com/hadoop/summit/
Space is limited, so please sign up early if you are
André,
You can try to rollback.
You did use upgrade when you switched to the new trunk, right?
--Konstantin
Raghu Angadi wrote:
André Martin wrote:
Hi Raghu,
done: https://issues.apache.org/jira/browse/HADOOP-2873
Subsequent tries did not succeed - so it looks like I need to
re-format the
Raghu Angadi wrote:
Please report such problems if you think it was because of HDFS, as
opposed to some hardware or disk failures.
Will do. I suspect it's something else. I'm testing on a notebook in
pseudo-distributed
mode (per the quick start guide). My IP changes when I take that box
If your file system metadata is in /tmp, then you are likely to see
these kinds of problems. It would be nice if you can move the location
of your metadata files away from /tmp. If you still see the problem, can
you pl send us the logs from the log directory?
Thanks a bunch,
Dhruba
Joins are easy.
Just reduce on a key composed of the stuff you want to join on. If the data
you are joining is disparate, leave some kind of hint about what kind of
record you have.
The reducer will be iterating through sets of records that have the same
key. This is similar to the results
We have been unable to get torque up and running. The magic value in the
server_name file seems to elude us.
We have tried localhost, 127.0.0.1, machine name, machine ip, fq machine
name. Depending on what we use, we either get
Unauthorized request or invalid entry
qmgr obj= svr=default: Bad
I read the docs about rack awareness but my issue is how the client can
pick some specific datanodes, which are located in some specific rack,
to write the block there. The idea is that the client is able to write
the block in two separated groups of datanodes in the same hdfs. For
instance:
I agree, I love to be part of this but the rooms are full.
Xavier
-Original Message-
From: Stefan Groschupf [mailto:[EMAIL PROTECTED]
Sent: Friday, February 22, 2008 11:04 AM
To: core-user@hadoop.apache.org
Subject: Re: Hadoop summit / workshop at Yahoo!
Puhh, 2 days and it is full?
There is a package for joining data from multiple sources:
contrib/data-join.
It implements the basic joining logic and allows the user to provide
application specific logic for filtering/projecting and combining
multiple records into one.
Runping
-Original Message-
From: Ted
Hi,
We're having problems when trying to deal with the namenode failover, by
following the wiki
http://wiki.apache.org/hadoop/NameNodeFailover
If we point dfs.name.dir to 2 local directories, it works fine.
But, if one of the directories is NFS mounted, we're having these problems:
1)
[EMAIL PROTECTED] wrote:
I read the docs about rack awareness but my issue is how the client can
pick some specific datanodes, which are located in some specific rack,
to write the block there. The idea is that the client is able to write
the block in two separated groups of datanodes in the
Hi all,
I have a program need to use two reduce fucntions, who can tell me why?
Thank you!
Qiang
25 matches
Mail list logo