Re: How to debug a MapReduce application

2009-01-19 Thread Pedro Vivancos
I am terribly sorry. I made a mistake. This is the output I get: 09/01/19 07:59:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/01/19 07:59:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool

Re: Hadoop 0.17.1 = EOFException reading FSEdits file, what causes this? how to prevent?

2009-01-19 Thread Rasit OZDAS
I would prefer catching the EOFException in my own code, assuming you are happy with the output before exception occurs. Hope this helps, Rasit 2009/1/16 Konstantin Shvachko s...@yahoo-inc.com Joe, It looks like you edits file is corrupted or truncated. Most probably the last modification

Re: Calling a mapreduce job from inside another

2009-01-19 Thread Sagar Naik
You can also play with the priority of the jobs to have the innermost job finish first -Sagar Devaraj Das wrote: You can chain job submissions at the client. Also, you can run more than one job in parallel (if you have enough task slots). An example of chaining jobs is there in

Haddop Error Massage

2009-01-19 Thread Deepak Diwakar
Hi friends, could somebody tell me what does the following quoted massage mean? 3154.42user 76.09system 44:47.21elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (15major+6092226minor)pagefaults 0swaps First part tells about system usage but what is rest part? Is it because of

Re: Haddop Error Massage

2009-01-19 Thread Miles Osborne
that is a timing / space report Miles 2009/1/19 Deepak Diwakar ddeepa...@gmail.com: Hi friends, could somebody tell me what does the following quoted massage mean? 3154.42user 76.09system 44:47.21elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs

Windows Support

2009-01-19 Thread Dan Diephouse
I recognize that Windows support is, um, limited :-) But, any ideas what exactly would need to be changed to support Windows (without cygwin) if someone such as myself were so motivated? The most immediate thing I ran into was the UserGroupInformation which would need a windows implementation. I

Re: Haddop Error Massage

2009-01-19 Thread Deepak Diwakar
Thanks friend. 2009/1/19 Miles Osborne mi...@inf.ed.ac.uk that is a timing / space report Miles 2009/1/19 Deepak Diwakar ddeepa...@gmail.com: Hi friends, could somebody tell me what does the following quoted massage mean? 3154.42user 76.09system 44:47.21elapsed 120%CPU

Re: Windows Support

2009-01-19 Thread Dan Diephouse
On Mon, Jan 19, 2009 at 11:35 AM, Steve Loughran ste...@apache.org wrote: Dan Diephouse wrote: I recognize that Windows support is, um, limited :-) But, any ideas what exactly would need to be changed to support Windows (without cygwin) if someone such as myself were so motivated? The most

Re: Windows Support

2009-01-19 Thread Chris K Wensel
Hey Dan There is discussion/issue on this here: https://issues.apache.org/jira/browse/HADOOP-4998 ckw On Jan 19, 2009, at 8:55 AM, Dan Diephouse wrote: On Mon, Jan 19, 2009 at 11:35 AM, Steve Loughran ste...@apache.org wrote: Dan Diephouse wrote: I recognize that Windows support is, um,

Java RMI and Hadoop RecordIO

2009-01-19 Thread David Alves
Hi I've been testing some different serialization techniques, to go along with a research project. I know motivation behind hadoop serialization mechanism (e.g. Writable) and the enhancement of this feature through record I/O is not only performance, but also control of the input/output.

Java RMI and Hadoop RecordIO

2009-01-19 Thread David Alves
Hi I've been testing some different serialization techniques, to use in a research project. I know motivation behind hadoop serialization mechanism (e.g. Writable) and the enhancement of this feature through record I/O is not only performance, but also control of the input/output. Still

Re: Performance testing

2009-01-19 Thread Sandeep Dhawan
Hi, I am in the process of following your guidelines. I would like to know: 1. How can block size impact the performance of a mapred job. 2. Does the performance improve if I setup NameNode and JobTracker on different machine. At present, I am running Namenode and JobTracker on the same

Hadoop Exceptions

2009-01-19 Thread Sandeep Dhawan
Here are few hadoop exceptions that I am getting while running mapred job on 700MB of data on a 3 node cluster on Windows platform (using cygwin): 1. 2009-01-08 17:54:10,597 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_-4309088198093040326_1001 received exception java.io.IOException:

Re: Upgrading and patching

2009-01-19 Thread Philip
Thanks Brian, I have just one more question: When building my own release where do I enter in the version and compiled by information? Thanks, Phil On Fri, Jan 16, 2009 at 6:23 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hey Philip, I've found it easier to download the release, apply

Re: Maven repo for Hadoop

2009-01-19 Thread Owen O'Malley
On Jan 17, 2009, at 5:53 PM, Chanwit Kaewkasi wrote: I would like to integrate Hadoop to my project using Ivy. Is there any maven repository containing Hadoop jars that I can point my configuration to? Not yet, but soon. We recently introduced ivy into Hadoop, so I believe we'll upload the

Re: Performance testing

2009-01-19 Thread Jothi Padmanabhan
Hi, see answers inline below HTH, Jothi I would like to know: 1. How can block size impact the performance of a mapred job. From the M/R side, the fileSystem block size of the input files is treated as an upper bound for input splits. . Since each input split translates into one map, this

Distributed Key-Value Databases

2009-01-19 Thread Philip (flip) Kromer
Hey y'all, There've been a few questions about distributed database solutions (a partial list: HBase, Voldemort, Memcached, ThruDB, CouchDB, Ringo, Scalaris, Kai, Dynomite, Cassandra, Hypertable, as well as the closed Dynamo, BigTable, SimpleDB). For someone using Hadoop at scale, what problem

hadoop balanceing data

2009-01-19 Thread Billy Pearson
Why do we not use the Remaining % in place of use Used % when we are selecting datanode for new data and when running the balancer. form what I can tell we are using the use % used and we do not factor in non DFS Used at all. I see a datanode with only a 60GB hard drive fill up completely 100%