Re: Does any one tried to build Hadoop..

2008-04-11 Thread Khalil Honsali
I now understand your problem, I replicated it. If you load the build.xml from eclipse, and go to the properties>build path>Libraries, you'll find a JRE_LIB, remove that one and add JRE System Library. hope it solves it. On 12/04/2008, Khalil Honsali <[EMAIL PROTECTED]> wrote: > > my guess it's an

Re: Does any one tried to build Hadoop..

2008-04-11 Thread Khalil Honsali
my guess it's an import problem.. how about changing 2) to version 6 for compiler version? On 12/04/2008, krishna prasanna <[EMAIL PROTECTED]> wrote: > > Java version > java version "1.6.0_05" > Java(TM) SE Runtime Environment (build 1.6.0_05-b13) > Java HotSpot(TM) Client VM (build 10.0-b19, mixe

Re: "could only be replicated to 0 nodes, instead of 1"

2008-04-11 Thread Raghu Angadi
jerrro wrote: I couldn't find much information about this error, but I did manage to see somewhere it might mean that there are no datanodes running. But as I said, start-all does not give any errors. Any ideas what could be problem? start-all return does not mean datanodes are ok. Did you che

Re: Hadoop performance on EC2?

2008-04-11 Thread Chris K Wensel
What does ganglia show for load and network? You should also be able to see gc stats (count and time). Might help as well. fyi, running > hadoop-ec2 proxy will both setup a socks tunnel and list available urls you can cut/ paste into your browser. one of the urls is for the ganglia interfa

Re: Hadoop performance on EC2?

2008-04-11 Thread Nate Carlson
On Wed, 9 Apr 2008, Chris K Wensel wrote: make sure all nodes are running in the same 'availability zone', http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1347 check! and that you are using the new xen kernels. http://developer.amazonwebservices.com/connect/entry.jspa?ext

RE: Mapper OutOfMemoryError Revisited !!

2008-04-11 Thread Devaraj Das
Which hadoop version are you on? > -Original Message- > From: bhupesh bansal [mailto:[EMAIL PROTECTED] > Sent: Friday, April 11, 2008 11:21 PM > To: [EMAIL PROTECTED] > Subject: Mapper OutOfMemoryError Revisited !! > > > Hi Guys, I need to restart discussion around > http://www.nabble

Re: 答复: Problem with key aggregation when number of reduce tasks is more than 1

2008-04-11 Thread Pete Wyckoff
Yes and as such, we've found better load balancing when the #of reduces is a prime #. Although the string.hashCode isn't great for short strings. On 4/11/08 4:16 AM, "Zhang, jian" <[EMAIL PROTECTED]> wrote: > Hi, > > Please read this, you need to implement partitioner. > It controls which key

Re: Using NFS without HDFS

2008-04-11 Thread slitz
Thank you for the file:/// tip, i was not including it in the paths. I'm running the example with this line -> bin/hadoop jar hadoop-*-examples.jar grep file:///home/slitz/warehouse/input file:///home/slitz/warehouse/output 'dfs[a-z.]+' But i'm getting the same error as before, i'm getting org.ap

Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder

2008-04-11 Thread Ted Dunning
Just call addInputFile multiple times after filtering. (or is it addInputPath... Don't have documentation handy) On 4/11/08 6:33 AM, "Alfonso Olias Sanz" <[EMAIL PROTECTED]> wrote: > Hi > I have a general purpose input folder that it is used as input in a > Map/Reduce task. That folder contain

Mapper OutOfMemoryError Revisited !!

2008-04-11 Thread bhupesh bansal
Hi Guys, I need to restart discussion around http://www.nabble.com/Mapper-Out-of-Memory-td14200563.html I saw the same OOM error in my map-reduce job in the map phase. 1. I tried changing mapred.child.java.opts (bumped to 600M) 2. io.sort.mb was kept at 100MB. I see the same errors still.

Re: Does any one tried to build Hadoop..

2008-04-11 Thread krishna prasanna
Java version java version "1.6.0_05" Java(TM) SE Runtime Environment (build 1.6.0_05-b13) Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing) Steps that i did : 1) Opened a new java project in Eclipse. (From existing directory path). 2) Modified Java compiler version as 5 in project

Mapper OutOfMemoryError Revisited !!

2008-04-11 Thread bhupesh bansal
Hi Guys, I need to restart discussion around http://www.nabble.com/Mapper-Out-of-Memory-td14200563.html I saw the same OOM error in my map-reduce job in the map phase. 1. I tried changing mapred.child.java.opts (bumped to 600M) 2. io.sort.mb was kept at 100MB. I see the same errors still. I

Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder

2008-04-11 Thread Arun C Murthy
On Apr 11, 2008, at 10:21 AM, Amar Kamat wrote: A simpler way is to use FileInputFormat.setInputPathFilter(JobConf, PathFilter). Look at org.apache.hadoop.fs.PathFilter for details on PathFilter interface. +1, although FileInputFormat.setInputPathFilter is available only in hadoop-0.17 a

Re: mailing list archive broken?

2008-04-11 Thread Nathan Fiedler
Yes, it's been like that for days. Hopefully someone in Apache can fix it. In the meantime, you can use the Nabble site: http://www.nabble.com/Hadoop-core-user-f30590.html n

Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder

2008-04-11 Thread Amar Kamat
A simpler way is to use FileInputFormat.setInputPathFilter(JobConf, PathFilter). Look at org.apache.hadoop.fs.PathFilter for details on PathFilter interface. Amar Alfonso Olias Sanz wrote: Hi I have a general purpose input folder that it is used as input in a Map/Reduce task. That folder contai

Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder

2008-04-11 Thread Amar Kamat
One way to do this is to write your own (file) input format. See src/java/org/apache/hadoop/mapred/FileInputFormat.java. You need to override listPaths() in order to have selectivity amongst the files in the input folder. Amar Alfonso Olias Sanz wrote: Hi I have a general purpose input folder

MiniDFSCluster error on windows.

2008-04-11 Thread Edward J. Yoon
It occurs only on windows system. (cygwin) Does anyone have the solution? Testcase: testCosine took 0.708 sec Caused an ERROR Address family not supported by protocol family: bind java.net.SocketException: Address family not supported by protocol family: bind at sun.nio.ch.Net

Re: Using NFS without HDFS

2008-04-11 Thread Luca
slitz wrote: I've read in the archive that it should be possible to use any distributed filesystem since the data is available to all nodes, so it should be possible to use NFS, right? I've also read somewere in the archive that this shoud be possible... As far as I know, you can refer to any

Re: Hadoop performance on EC2?

2008-04-11 Thread Nate Carlson
On Thu, 10 Apr 2008, Ted Dziuba wrote: I have seen EC2 be slower than a comparable system in development, but not by the factors that you're experiencing. One thing about EC2 that has concerned me - you are not guaranteed that your "/mnt" disk is an uncontested spindle. Early on, this was the

Re: Using NFS without HDFS

2008-04-11 Thread Owen O'Malley
On Apr 11, 2008, at 7:43 AM, slitz wrote: I've read in the archive that it should be possible to use any distributed filesystem since the data is available to all nodes, so it should be possible to use NFS, right? I've also read somewere in the archive that this shoud be possible... It is p

Re: Using NFS without HDFS

2008-04-11 Thread slitz
I've read in the archive that it should be possible to use any distributed filesystem since the data is available to all nodes, so it should be possible to use NFS, right? I've also read somewere in the archive that this shoud be possible... slitz On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishno

[HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder

2008-04-11 Thread Alfonso Olias Sanz
Hi I have a general purpose input folder that it is used as input in a Map/Reduce task. That folder contains files grouped by names. I want to configure the JobConf in a way I can filter the files that have to be processed from that pass (ie files which name starts by Elementary, or Source etc)

Re: Does any one tried to build Hadoop..

2008-04-11 Thread Khalil Honsali
what is your java version? also please describe exactly what you've done On 11/04/2008, krishna prasanna <[EMAIL PROTECTED]> wrote: > > I Tried in both ways i am still i am getting some errors > > --- import org.apache.tools.ant.BuildException; (error: cannot be > resolved..) > --- public Socket c

RE: What's the proper way to use hadoop task side-effect files?

2008-04-11 Thread Runping Qi
Look like you use your reducer class as the combiner. The combiner will be called from mappers, potentially for multiple times. If you want to create side files in reducer, you cannot use that class as the combiner. Runping > -Original Message- > From: Zhang, jian [mailto:[EMAIL PROT

Re: Using NFS without HDFS

2008-04-11 Thread Peeyush Bishnoi
Hello , To execute Hadoop Map-Reduce job input data should be on HDFS not on NFS. Thanks --- Peeyush On Fri, 2008-04-11 at 12:40 +0100, slitz wrote: > Hello, > I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed > Filesystem. > > Box A: 192.168.2.3, this box is either

Hadoop performance in PC cluster

2008-04-11 Thread Yingyuan Cheng
Does anyone run Hadoop in PC cluster? I just tested WordCount in PC cluster, and my first impression as following: *** Number of PCs: 7(512M RAM, 2.8G CPU, 100M NIC, CentOS 5.0, Handoop 0.16.1, Sun jre 1.6) Mast

Using NFS without HDFS

2008-04-11 Thread slitz
Hello, I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed Filesystem. Box A: 192.168.2.3, this box is either the NFS server and working as a slave node Box B: 192.168.2.30, this box is only JobTracker Box C: 192.168.2.31, this box is only slave Obviously all three nodes can

mailing list archive broken?

2008-04-11 Thread Adrian Woodhead
I've noticed that the mailing lists archives seem to be broken here: http://hadoop.apache.org/mail/core-user/ I get a 403 forbidden. Any idea what's going on? Regards, Adrian

答复: Problem with key aggregation when number of reduce tasks is more than 1

2008-04-11 Thread Zhang, jian
Hi, Please read this, you need to implement partitioner. It controls which key is sent to which reducer, if u want to get unique key result, you need to implement partitioner and the compareTO function should work properly. [WIKI] Partitioner Partitioner partitions the key space. Partitioner

Problem with key aggregation when number of reduce tasks is more than 1

2008-04-11 Thread Harish Mallipeddi
Hi all, I wrote a custom key class (implements WritableComparable) and implemented the compareTo() method inside this class. Everything works fine when I run the m/r job with 1 reduce task (via setNumReduceTasks). Keys are sorted correctly in the output files. But when I increase the number of re