RE: one input file per map

2008-07-03 Thread Goel, Ankur
Nope, But if the intent is so then there are 2 ways of doing it. 1. Just extend the input format of your choice and override isSplitable() method to return false. 2. Compress your text file using a compression format supported by hadoop (e.g gzip). This will ensure that one map task processes 1

Re: scaling issue, please help

2008-07-03 Thread Amar Kamat
Mori Bellamy wrote: i discovered that some of my code was causing out of bounds exceptions. i cleaned up that code and the map tasks seemed to work. that confuses me -- i'm pretty sure hadoop is resilient to a few map tasks failing (5 out of 13k). before this fix, my remaining 2% of tasks

Re: failed map tasks

2008-07-03 Thread Amar Kamat
jerrro wrote: Hello, I was wondering - could someone tell me what are the reasons that I could get failure with certain map tasks on a node? Well, that depends on the kind of errors you are seeing. Could you plz post the logs/error messages? Amar Any idea that comes to mind would work (it

RE: topology.script.file.name

2008-07-03 Thread Devaraj Das
This is strange. If you don't mind, pls send the script to me. -Original Message- From: Yunhong Gu1 [mailto:[EMAIL PROTECTED] Sent: Thursday, July 03, 2008 9:49 AM To: core-user@hadoop.apache.org Subject: topology.script.file.name Hello, I have been trying to figure out

Re: Combiner is optional though it is specified?

2008-07-03 Thread novice user
To my surprise, only one output value of mapper is not reaching combiner. and It is consistent when I repeated the experimentation. Same point directly reaches reducer without going thru the combiner. I am surprised how can this happen? novice user wrote: Regarding the conclusion, I am

Difference between joining and reducing

2008-07-03 Thread Stuart Sierra
Hello all, After recent talk about joins, I have a (possibly) stupid question: What is the difference between the join operations in o.a.h.mapred.join and the standard merge step in a MapReduce job? I understand that doing a join in the Mapper would be much more efficient if you're lucky enough

Re: failed map tasks

2008-07-03 Thread jerrro
I am actually more interested in _theoretically_ what could happen to a map tasks to fail or to take longer... Don't have a specific case. Thanks, Jerr Amar Kamat wrote: jerrro wrote: Hello, I was wondering - could someone tell me what are the reasons that I could get failure with

Re: Help! How to overcome a RemoteException:

2008-07-03 Thread boris starchev
I have installed cygwin and hadoop-0.17.0 and have done 3 steps: 1)add JAVA_HOME in hadoop-env.sh 2)create hadoop-site.xml 3)execute commands: cd /cygdrive/c/hadoop-0.17.0 bin/start-all.sh bin/hadoop dfs -rmr input bin/hadoop dfs -put conf input NOT WORKING bin/hadoop dfs -ls bin/stop-all.sh

RE: Difference between joining and reducing

2008-07-03 Thread Ashish Thusoo
Hi Stuart, Join is a higher level logical operation while map/reduce is a technique that could be used implement it. Specifically, in relational algebra, the join construct specifies how to form a single output row from 2 rows arising from two input streams. There are very many ways of

Re: Inconsistency in namenode's and datanode's namespaceID

2008-07-03 Thread Konstantin Shvachko
Yes this is a known bug. http://issues.apache.org/jira/browse/HADOOP-1212 You should manually remove current directory from every data-node after reformatting the name-node and start the cluster again. I do not believe there is any other way. Thanks, --Konstantin Taeho Kang wrote: No, I don't

Re: XEN guest OS

2008-07-03 Thread Andreas Kostyrka
On Tuesday 01 July 2008 09:36:18 Ashok Varma wrote: Hi , I'm trying to install Fedora8 as a Guest OS in XEN on CentOS5.2 -64 bit. Always getting failed to Mount directory error. I configured NFS share, then also installation getting failed in middle.. Slightly offtopic on a hadoop mailing

Re: MapSide Join and left outer or right outer joins?

2008-07-03 Thread Chris Douglas
Forgive me if you already know this, but the correctness of the map- side join is very sensitive to partitioning; if your input in sorted but equal keys go to different partitions, your results may be incorrect. Is your input such that the default partitioning is sufficient? Have you

Re: Difference between joining and reducing

2008-07-03 Thread Chris Douglas
Ashish ably outlined the differences between a join and a merge, but might be confusing the o.a.h.mapred.join package and the contrib/ data_join framework. The former is used for map-side joins and has nothing to do with either the shuffle or the reduce; the latter effects joins in the

Re: MapSide Join and left outer or right outer joins?

2008-07-03 Thread Jason Venner
We are using the default partitioner. I am just about to start verifying my result as it took quite a while to work my way through the in-obvious issues of hand writing MapFiles, thinks like the key and value class are extracted from the jobconf, output key/value. Question: I looked at the

Re: Getting stats of running job from within job

2008-07-03 Thread Doug Cutting
Nathan Marz wrote: Is there a way to get stats of the currently running job programatically? This should probably be an FAQ. In your Mapper or Reducer's configure implementation, you can get a handle on the running job with: RunningJob running = new

RE: topology.script.file.name

2008-07-03 Thread Yunhong Gu1
This is my script, which is actually a C++ program: #include iostream #include string using namespace std; int main(int argc, char** argv) { for (int i = 1; i argc; i ++ ) { string dn = argv[i]; if (dn.substr(0, 5) == rack1) cout /rack1; else if

Help: how to check the active datanodes?

2008-07-03 Thread Richard Zhang
Hi guys: I am running hadoop on a 8 nodes cluster. I uses start-all.sh to boot hadoop and it shows that all 8 data nodes are started. However, when I use bin/hadoop dfsadmin -report to check the status of the data nodes and it shows only one data node (the one with the same host as name node) is

ERROR dfs.NameNode - java.io.EOFException

2008-07-03 Thread Otis Gospodnetic
Hi, Using Hadoop 0.16.2, I am seeing seeing the following in the NN log: 2008-07-03 19:46:26,715 ERROR dfs.NameNode - java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at

Hudson Patch Verifier's Output

2008-07-03 Thread Abdul Qadeer
Hi, I submitted a patch using JIRA and the Hudson system told that -1 contrib tests. The patch failed contrib unit tests. Seeing the console output, I noticed that it says build successful for contrib tests. So I am confused that what failed contrib test are referred to in Hudson output? This

Help regarding LoginException - CreateProcess: whoami error=2

2008-07-03 Thread Rutuja Joshi
Hello, I am new to Hadoop and am trying to run HadoopDfsReadWriteExamplehttp://wiki.apache.org/hadoop/HadoopDfsReadWriteExample?action=fullsearchvalue=linkto%3A%22HadoopDfsReadWriteExample%22context=180 from eclipse on Windows XP. I have added following files in the build path for the project:

Re: Hudson Patch Verifier's Output

2008-07-03 Thread Nigel Daley
A bug was introduced by HADOOP-3480. HADOOP-3653 will fix it. Nige On Jul 3, 2008, at 5:24 PM, Abdul Qadeer wrote: Hi, I submitted a patch using JIRA and the Hudson system told that -1 contrib tests. The patch failed contrib unit tests. Seeing the console output, I noticed that it says

Re: Volunteer recruitment for RDF store project on Hadoop

2008-07-03 Thread Edward J. Yoon
Thanks for all interest. BTW, I can't handle too many people via private email , Please join this group. http://groups.google.com/group/hrdfstore Thanks, Edward On Wed, Jul 2, 2008 at 3:06 PM, Edward J. Yoon [EMAIL PROTECTED] wrote: Hello all, The HRdfStore team looking for a couple more

Re: Help: how to check the active datanodes?

2008-07-03 Thread Mafish Liu
Hi, zhang: Once you start hadoop with shell start-all.sh, a hadoop status pape can be accessed through http://namenode-ip:port/dfshealth. Port is specified by namedfs.http.address/name in your hadoop-default.xml. If the datanodes status is not as expected, you need to check log files.

nested for loops

2008-07-03 Thread Alan Horowitz
I'm a newbie, so feel free to rftm is this is old hat: what's the best way to do a nested for loop in hadoop? Specifically, lets say I've got a list of elements, and I want to do an all against all comparison. The standard nested for loop would be: for i in 1..10: for j in i..10: