Re: Hadoop DFS

2008-07-25 Thread hong
HBase is the project that use dfs if you want to know how to use dfs directly, bin/hadoop script may be a good entrance. For example, bin/hadoop dfs -cat *** *** is a file name in your dfs follow this command, you can find how to access dfs directly. Hope it will help you 在

Re: Using MapReduce to do table comparing.

2008-07-25 Thread James Moore
On Thu, Jul 24, 2008 at 8:03 AM, Amber [EMAIL PROTECTED] wrote: Yes, I think this is the simplest method , but there are problems too: 1. The reduce stage wouldn't begin until the map stage ends, by when we have done a two table scanning, and the comparing will take almost the same time,

Re: Bean Scripting Framework?

2008-07-25 Thread Lincoln Ritter
I was using BSF to avoid java 6 issues. However I'm having similar issues using both systems. Basically, I can't load the scripting engine from within hadoop. I have successfully compiled and run some stand-alone test examples but am having trouble getting anything to work from hadoop. One

Is there a network communication counter for mapred?

2008-07-25 Thread Kevin
Hi, Besides knowing data-local and rack-local map task numbers, I am interested in the size of data that are transferred on network. E.g., the size of intermediate map output transferred (not dealt locally). I wonder if there is such a counter. Thank you. Best, -Kevin

Re: Bean Scripting Framework?

2008-07-25 Thread Lincoln Ritter
This is a bit scattered but I wanted to post this in case it might help someone... Here's a little more detail on the loading problems I've been having. For now, I'm just trying to call some ruby from the reduce method of my map/reduce job. I want to move to a more general setup, like the one

Re: Bean Scripting Framework?

2008-07-25 Thread Andreas Kostyrka
On Friday 25 July 2008 15:18:24 James Moore wrote: On Thu, Jul 24, 2008 at 10:48 PM, Venkat Seeth [EMAIL PROTECTED] wrote: Why dont you use hadoop streaming? I think that's more of a broader question - why doesn't everyone use streaming? There's no real difference between doing Hadoop in

RE: Bean Scripting Framework?

2008-07-25 Thread Joydeep Sen Sarma
Just as an aside - there is probably a general perception that streaming is really slow (at least I had it). The last I did some profiling (in 0.15) - the primary overheads from streaming came from the scripting language (python is sssw). For an insanely fast script (bin/cat), I saw

Re: Bean Scripting Framework?

2008-07-25 Thread Arun C Murthy
On Jul 25, 2008, at 3:53 PM, Joydeep Sen Sarma wrote: Just as an aside - there is probably a general perception that streaming is really slow (at least I had it). The last I did some profiling (in 0.15) - the primary overheads from streaming came from the scripting language (python is

Re: newbie install

2008-07-25 Thread Jose Vidal
Turns out, it does cause problems later on. I think the problem is that the slaves have, in their hosts files: 127.0.0.1 localhost.localdomain localhost 127.0.0.1 machinename.cse.sc.edu machinename The reduce phase fails because the reducer cannot get data from the mappers as it tries to open a