Re: Using Hadoop with executables and binary data

2009-08-10 Thread Stefan Podkowinski
Jaliya, did you consider Hadoop Streaming for your case? http://wiki.apache.org/hadoop/HadoopStreaming On Wed, Jul 29, 2009 at 8:35 AM, Jaliya Ekanayake wrote: > Dear Hadoop devs, > > > > Please help me to figure out a way to program the following problem using > Hadoop. > > I have a program whi

re: status of hadoop 0.20.1?

2009-08-10 Thread 柳松
I also want to ask this question... -邮件原件- 发件人: common-user-return-16633-lamfeeling=126@hadoop.apache.org [mailto:common-user-return-16633-lamfeeling=126@hadoop.apache.org] 代表 Kevin Weil 发送时间: 2009年8月10日 11:59 收件人: common-user@hadoop.apache.org 主题: status of hadoop 0.20.1? What is

Re: changing logging

2009-08-10 Thread John Clarke
Thanks for the reply. I considered that but I have a lot of threads in my application and it's v handy to have log4j output the thread name with the log message. It's like the log4j.properties file in the conf/ directory is not being used as any changes I make seem to have no effect! 2009/8/8

Re: Some tasks fail to report status between the end of the map and the beginning of the merge

2009-08-10 Thread Koji Noguchi
> but I didn't find a config option > that allows ignoring tasks that fail. > If 0.18, http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo bConf.html#setMaxMapTaskFailuresPercent(int) (mapred.max.map.failures.percent) http://hadoop.apache.org/common/docs/r0.18.3/api/org/

InputSplits, Serializers in Hadoop 0.20

2009-08-10 Thread Saptarshi Guha
Hello, In my custom inputformat written using the new Hadoop 0.20 API, I get rhe following error at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:899) at

Re: InputSplits, Serializers in Hadoop 0.20

2009-08-10 Thread Saptarshi Guha
Fixed. InputSplits in 0.20 should implement Writable On Mon, Aug 10, 2009 at 11:49 AM, Saptarshi Guha wrote: > Hello, > In my custom inputformat written using the new Hadoop 0.20 API, I get > rhe following error >        at > org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(Seri

Re: ~ Replacement for MapReduceBase ~

2009-08-10 Thread Aaron Kimball
Naga, That's right. In the old API, Mapper and Reducer were just interfaces and didn't provide default implementations of their code. Thus MapReduceBase. Now Mapper and Reducer are classes to extend, so no MapReduceBase is needed. - Aaron On Fri, Aug 7, 2009 at 8:26 AM, Naga Vijayapuram wrote:

newbie question

2009-08-10 Thread hadooprcoks
Hi,I was looking at the FileSystem API and have couple of quick questions for experts. In the FileSystem.create() call two of the parameters are bufferSize and blockSize. I understand they correspond to io.file.buffer.size and dfs.block.size properties in the config files. My question is - do we

Re: newbie question

2009-08-10 Thread Aaron Kimball
You can set it on a per-file basis if you'd like the control. The data structures associated with files allow these to be individually controlled. But there's also a create() call that only accepts the Path to open as an argument. This uses the configuration file defaults. This use case is conside

Re: XML files in HDFS

2009-08-10 Thread Joerg Rieger
Hello, while flipping through the cloud9 collections, I came across an XML InputFormat class: http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/api/edu/umd/cloud9/collection/XMLInputFormat.html I haven't used it myself, but It might be worth a try. Joerg On 30.07.2009, at 14:16, Hyunsik Ch

RE: OutputCommitter for rollbacks?

2009-08-10 Thread Deepika Khera
Thanks Amareshwari for your response. It seems like a good idea to use the map progress & reduce progress. My only concern is that in the web interface(jobdetails.jsp) , we see some of our jobs show 100% map & 100% reduce, while the reduce still seems to be running(Not sure but maybe it's just a U

corrupt filesystem

2009-08-10 Thread Mayuran Yogarajah
Hello all, What can cause HDFS to become corrupt? I was running some jobs which were failing. When I checked logs I saw that some files were corrupt so I ran 'hadoop fsck /' which showed that a few files were corrupt: /user/data/2009-07-01/165_2009-07-01.log: CORRUPT block blk_169750933292

Re: corrupt filesystem

2009-08-10 Thread Raghu Angadi
> I had assumed that if a replica became corrupt that it would be replaced > by a non-corrupt copy. > Is this not the case? yes it is. Usually some random block might be corrupted for various reasons and it gets replaced by another replica of the block. A block might stay in corrupt state if

Re: corrupt filesystem

2009-08-10 Thread Mayuran Yogarajah
Hello, If you are interested, you could try to trace one of these block ids in NameNode log to see what happened it. We are always eager to hear about irrecoverable errors. Please mention hadoop version you are using. I'm using Hadoop 0.18.3. I just checked namenode log for one of the bad

Re: OutputCommitter for rollbacks?

2009-08-10 Thread Amareshwari Sriramadasu
Deepika Khera wrote: Thanks Amareshwari for your response. It seems like a good idea to use the map progress & reduce progress. My only concern is that in the web interface(jobdetails.jsp) , we see some of our jobs show 100% map & 100% reduce, while the reduce still seems to be running(Not sure

Question on file HDFS file system

2009-08-10 Thread ashish pareek
Hi Everybody, I would like to understand HDFS source code. But basic question I want to ask is what type of data-structure does HDFS use to store INode information. I am interested in knowing what code of HDFS is responsible with meta-data info of files. Please can some HDFS-Gu