HBase is the project that use dfs
if you want to know how to use dfs directly, bin/hadoop script may
be a good entrance.
For example,
bin/hadoop dfs -cat ***
*** is a file name in your dfs
follow this command, you can find how to access dfs directly.
Hope it will help you
在
On Thu, Jul 24, 2008 at 8:03 AM, Amber [EMAIL PROTECTED] wrote:
Yes, I think this is the simplest method , but there are problems too:
1. The reduce stage wouldn't begin until the map stage ends, by when we have
done a two table scanning, and the comparing will take almost the same time,
I was using BSF to avoid java 6 issues. However I'm having similar
issues using both systems. Basically, I can't load the scripting
engine from within hadoop. I have successfully compiled and run some
stand-alone test examples but am having trouble getting anything to
work from hadoop. One
Hi,
Besides knowing data-local and rack-local map task numbers, I am
interested in the size of data that are transferred on network. E.g.,
the size of intermediate map output transferred (not dealt locally). I
wonder if there is such a counter. Thank you.
Best,
-Kevin
This is a bit scattered but I wanted to post this in case it might
help someone...
Here's a little more detail on the loading problems I've been having.
For now, I'm just trying to call some ruby from the reduce method of
my map/reduce job. I want to move to a more general setup, like the
one
On Friday 25 July 2008 15:18:24 James Moore wrote:
On Thu, Jul 24, 2008 at 10:48 PM, Venkat Seeth [EMAIL PROTECTED] wrote:
Why dont you use hadoop streaming?
I think that's more of a broader question - why doesn't everyone use
streaming?
There's no real difference between doing Hadoop in
Just as an aside - there is probably a general perception that streaming
is really slow (at least I had it).
The last I did some profiling (in 0.15) - the primary overheads from
streaming came from the scripting language (python is sssw). For
an insanely fast script (bin/cat), I saw
On Jul 25, 2008, at 3:53 PM, Joydeep Sen Sarma wrote:
Just as an aside - there is probably a general perception that
streaming
is really slow (at least I had it).
The last I did some profiling (in 0.15) - the primary overheads from
streaming came from the scripting language (python is
Turns out, it does cause problems later on.
I think the problem is that the slaves have, in their hosts files:
127.0.0.1 localhost.localdomain localhost
127.0.0.1 machinename.cse.sc.edu machinename
The reduce phase fails because the reducer cannot get data from the
mappers as it tries to open a