File is closed but data is not visible

2009-08-11 Thread Pallavi Palleti
Hi all, We have an application where we pull logs from an external server(far apart from hadoop cluster) to hadoop cluster. Sometimes, we could see huge delay (of 1 hour or more) in actually seeing the data in HDFS though the file has been closed and the variable is set to null from the externa

what is the hadoop version used in both pig2.0 and nutch1.0

2009-08-11 Thread venkata ramanaiah anneboina
Hi I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion. what is common verion; Which version of pig and which version of nutch i need to use (this brings comman hadoop version only) please can any one help on this; thanks ramana

Re: changing logging

2009-08-11 Thread Steve Loughran
John Clarke wrote: Thanks for the reply. I considered that but I have a lot of threads in my application and it's v handy to have log4j output the thread name with the log message. It's like the log4j.properties file in the conf/ directory is not being used as any changes I make seem to have no

Re: HADOOP-4539 question

2009-08-11 Thread Steve Loughran
Stas Oskin wrote: Hi. What is the recommended a utility for this? Thanks. for those of us whose hosts are virtual and who have control over the infrastructure its fairly simple: bring up a new VM on a different blade with the same base image and hostname. If you have a non-virtual cluster

Re: corrupt filesystem

2009-08-11 Thread Harish Mallipeddi
On Tue, Aug 11, 2009 at 4:45 AM, Mayuran Yogarajah < mayuran.yogara...@casalemedia.com> wrote: > Hello, > > If you are interested, you could try to trace one of these block ids in >> NameNode log to see what happened it. We are always eager to hear about >> irrecoverable errors. Please mention ha

Re: File is closed but data is not visible

2009-08-11 Thread Jason Venner
Please provide information on what version of hadoop you are using and the method of opening and closing the file. On Tue, Aug 11, 2009 at 12:48 AM, Pallavi Palleti < pallavi.pall...@corp.aol.com> wrote: > Hi all, > > We have an application where we pull logs from an external server(far apart >

what is the hadoop version used in both pig2.0 and nutch1.0

2009-08-11 Thread venkata ramanaiah anneboina
Hi  I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion.  what is common verion;  Which version of pig and which version of nutch i need to use (this brings comman hadoop version only) please can any one help on this; thanks ramana

problem setting mapred.child.java.opts

2009-08-11 Thread Yair Even-Zohar
I'm running a mapreduce using Hbase table as input with some distributed cache file and all works well. However, when I set: c.set("mapred.child.java.opts", "-Xmx512m") in the java code and using the exact same input and exact same distributed cache I'm getting the following: on the maste

RE: problem setting mapred.child.java.opts

2009-08-11 Thread Yair Even-Zohar
Sorry to bug you guys again but I found the problem. An old hadoop-site that was in the class path and had limit on the "mapred.child.ulimit" to 50 Thanks -Yair -Original Message- From: Yair Even-Zohar [mailto:ya...@audiencescience.com] Sent: Tuesday, August 11, 2009 4:11 PM To: comm

Re: corrupt filesystem

2009-08-11 Thread Raghu Angadi
Note that there are multiple log files (one for each day). Make sure you searched all the relevant days. You can also check datanode log for this block. HDFS writes to all three datanodes at the time you write the data. It is possible that other two datanodes also encountered errors. This

RE: File is closed but data is not visible

2009-08-11 Thread Palleti, Pallavi
Hi Jason, Apologies for missing version information in my previous mail. I am using hadoop-0.18.3. I am getting FSDataOutputStream object using fs.create(new Path(some_file_name)), where fs is FileSystem object. And, I am closing the file using close(). Thanks Pallavi -Original Message-

Re: HADOOP-4539 question

2009-08-11 Thread Todd Lipcon
Hey Stas, You can also use a utility like Linux-HA (aka heartbeat) to handle IP address failover. It will even send gratuitous ARPs to make sure to get the new mac address registered after a failover. Check out this blog for info about a setup like this: http://www.cloudera.com/blog/2009/07/22/ha

FTP into HDFS

2009-08-11 Thread Turner Kunkel
Does anyone have any experience with using FTP with HDFS? I have all the config files setup correctly and have started the service. But, when I connect from a remote (Windows) machine: "Connection closed by remote host." And on the local (Ubuntu) machine: "412 Service not available... Permission d

Re: Question on file HDFS file system

2009-08-11 Thread Jakob Homan
Hey Ashish- In terms of how overall design architecture of HDFS, I would point you to the project documentation: http://hadoop.apache.org/common/docs/current/hdfs_design.html For specific data structures, your first stop should be the INode class and its extending classes, located in src/ja

NN + secondary got full, even though data nodes had plenty of space

2009-08-11 Thread Mayuran Yogarajah
I have a 6 node cluster running Hadoop 0.18.3. I'm trying to figure out how the data was spread out like this: node001 94.15% node002 94.16% node003 48.22% node004 47.85% node005 48.12% node006 43.18% Node 001 (NN) and node 002( secondary NN) bo

Class JobStatus

2009-08-11 Thread Mithila Nagendra
Hello Everyone I was trying to figure out how the JobStatus class can be used in Hadoop. Can someone guide me to an example? I want to put the method setRunState() to use. Thanks! Mithila

Re: File is closed but data is not visible

2009-08-11 Thread Raghu Angadi
Your assumption is correct. When you close the file, others can read the data. There is no delay expected before the data is visible. If there is an error either write() or close() would throw an error. When you say data is not visible do you mean readers can not see the file or can not see

Need info about "mapred.input.format.skew"

2009-08-11 Thread CubicDesign
Hi. Can anybody point me to the Apache documentation page for "mapred.input.format.skew" ? I cannot find the documentation for this parameter. What does it mean? Thanks

Creating a job

2009-08-11 Thread Mithila Nagendra
Hello All How do I create a Job in Hadoop using Class Job? And how do I run it? Generally JobClient.runJob(conf) is used, but the parameter in not of the type Job. Also How do I use the class JobControl? Can I create Threads in a Hadoop (similar to multithreading in JAVA), where different Threads

Re: XML files in HDFS

2009-08-11 Thread Aaron Kimball
Wasim, RecordReader implementations should never require that elements not be spread across multiple blocks. The start and end offsets into a file in an InputSplit are taken as soft limits, not hard ones. The RecordReader implementations that come with Hadoop perform this way, and any that you aut

Re: Creating a job

2009-08-11 Thread Jakob Homan
Hey Mithila- I would point you to the WordCount example (http://hadoop.apache.org/common/docs/current/mapred_tutorial.html) for a basic example of how jobs are created by supplying a JobConf to the JobClient. This will submit your conf to the cluster which will create and run the job. Th

Extra 4 bytes at beginning of serialized file

2009-08-11 Thread Kris Jirapinyo
Hi all, I was wondering if anyone's encountered 4 extra bytes at the beginning of the serialized object file using MultipleOutputFormat. Basically, I am using BytesWritable to write the serialized byte arrays in the reducer phase. My writer is a generic one: public class GenericOutputFormat e

Re: Extra 4 bytes at beginning of serialized file

2009-08-11 Thread Todd Lipcon
BytesWritable serializes itself by first outputting the array length, and then outputting the array itself. The 4 bytes at the top of the file are the length of the value itself. Hope that helps -Todd On Tue, Aug 11, 2009 at 6:33 PM, Kris Jirapinyo wrote: > Hi all, > I was wondering if anyone'

Re: Extra 4 bytes at beginning of serialized file

2009-08-11 Thread Kris Jirapinyo
Ah that explains it, thanks Todd. Is there a way to serialize an object without using BytesWritable, or some way I can have a "perfect" serialized file so I won't have to keep discarding the first 4 bytes of the files? -- Kris. On Tue, Aug 11, 2009 at 7:03 PM, Todd Lipcon wrote: > BytesWritabl

Re: Extra 4 bytes at beginning of serialized file

2009-08-11 Thread Todd Lipcon
If you know you'll only have one object in the file, you could write your own Writable implementation which doesn't write its length. The problem is that you'll never be able to *read* it, since writables only get an input stream and thus don't know the file size. If you choose to do this, just mo

Re: Question on file HDFS file system

2009-08-11 Thread ashish pareek
Hi Everybody, I am already tracing through source code and trying to figure out things. Any way thanks for all your suggestions. Regards, Ashish On Tue, Aug 11, 2009 at 11:32 PM, Jakob Homan wrote: > Hey Ashish- > In terms of how overall design architecture of HDFS, I woul

Re: File is closed but data is not visible

2009-08-11 Thread Pallavi Palleti
Hi Raghu, The file doesn't appear in the cluster when I saw it from Namenode UI. Also, I have a monitor at cluster side which checks whether file is created and throws an exception when it is not created. And, it threw an exception saying "File not found". Thanks Pallavi - Original Message

Re: Creating a job

2009-08-11 Thread Mithila Nagendra
Hello Jakob Yes I have gone through the Job Submission strategy in Hadoop, that is helpful. But I was looking at interdependent jobs, I was trying to switch the state of a running job to waiting. I was looking at Jobcontrol for that reason. I have gone through the document you pointed out, was wo

Re: hdfs and small file size

2009-08-11 Thread jaideep dhok
There's an interview with one of the GFS engineers on ACM Queue that might be of interest to you. Its related to GFS, but I think the underlying issues are the same in HDFS. There is lot of discussion on dealing with large number of files. Here's the link: http://queue.acm.org/detail.cfm?id=1594206

Re: what is the hadoop version used in both pig2.0 and nutch1.0

2009-08-11 Thread vikas
Hi, Please check out the below links, http://mail-archives.apache.org/mod_mbox/hadoop-pig-user/200904.mbox/%3c004f01c9c6f2$b1222500$13666f...@com%3e http://wiki.apache.org/nutch/Upgrading_Hadoop If you find any issues in upgrading Hadoop version with Nutch probably getting in touch with Nutch m

which versions of pig,nutch and hadoop are requeired to run at once

2009-08-11 Thread venkata ramanaiah anneboina
Hi I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion. what is common hadoop verion for both pig and hadoop; GIVE the pig version, nutch version and hadoo please can any one help on this thanks ramanaiah

What will we encounter if we add a lot of nodes into the current cluster?

2009-08-11 Thread yang song
Dear all I'm sorry to disturb you. Our cluster has 200 nodes now. In order to improve its ability, we hope to add 60 nodes into the current cluster. However, we all don't know what will happen if we add so many nodes at the same time. Could you give me some tips and notes? During the proces

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-11 Thread Ted Dunning
If you add these nodes, data will be put on them as you add data to the cluster. Soon after adding the nodes you should rebalance the storage to avoid age related surprises in how files are arranged in your cluster. Other than that, your addition should cause little in the way of surprises. On T