Error : conf/configuration :Failed to set setXIncludeAware(true) for parser

2010-09-17 Thread Adarsh Sharma
Dear all, I am trying to connect Hive through my application but i am getting the following error : 12:03:10 ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser org.apache.xerces.jaxp.documentbuilderfactoryi...@e6c:java.lang.UnsupportedOperationException: This parse

Error

2010-09-17 Thread Adarsh Sharma
Dear all, I am trying to connect Hive through my application but i am getting the following error : 12:03:10 ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser org.apache.xerces.jaxp.documentbuilderfactoryi...@e6c:java.lang.UnsupportedOperationException: This parse

Re: How to recover a namenode?

2010-09-17 Thread ChingShen
I think I got it! Step by step as following: 1. put two files into hdfs. 2. kill -9 ${namenode_pid}. 3. delete the whole ${hadoop.tmp.dir}/dfs directory of NN. 4. copy the dfs directory of BN to NN. 5. start-up NN. It works. Is it correct? Shen On Fri, Sep 17, 2010 at 2:58 PM, ChingShen

Re: Map Output

2010-09-17 Thread Amogh Vasekar
Hi, >>As far as I know, the map output is written to the local disk then shipped to >>reducer via network. Is this correct? Yes. Each reducer picks up its own partition from the map output, once the map task completes. However, its little more complicated (and very interesting) on the map side.

Re: How to run a job?

2010-09-17 Thread David Rosenstrauch
On 09/17/2010 12:53 AM, Mark Kerzner wrote: Hi, the documentationsays I should do this: JobClient.*runJob*(JobConf

Re: do you need to call super in Mapper.Context.setup()?

2010-09-17 Thread David Rosenstrauch
On 09/16/2010 11:38 PM, Mark Kerzner wrote: Hi, any need for this, protected void setup(Mapper.Context context) throws IOException, InterruptedException { super.setup(context); // TODO - does this need to be done? this.context = context; } Thank you, Mark "Use the source Lu

Re: How to run a job?

2010-09-17 Thread Mark Kerzner
thank you On Fri, Sep 17, 2010 at 9:27 AM, David Rosenstrauch wrote: > On 09/17/2010 12:53 AM, Mark Kerzner wrote: > >> Hi, >> >> the documentation< >> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/JobClient.html#runJob(org.apache.hadoop.mapred.JobConf) >> >says >> I s

Re: do you need to call super in Mapper.Context.setup()?

2010-09-17 Thread Mark Kerzner
Thank you for the advice and for teaching the common expression Mark On Fri, Sep 17, 2010 at 9:29 AM, David Rosenstrauch wrote: > On 09/16/2010 11:38 PM, Mark Kerzner wrote: > >> Hi, >> >> any need for this, >> >> protected void setup(Mapper.Context context) throws IOException, >> InterruptedExc

Re: Number of Mappers Running Simultaneously

2010-09-17 Thread Allen Wittenauer
On Sep 16, 2010, at 10:54 PM, Amogh Vasekar wrote: > If for your job, and you want to control it on a per node basis, one way is > to allocate more memory to each of your mapper so it occupies more than one > slot. If a slot is free, a task will be scheduled on it and that's more or > less out

Re: do you need to call super in Mapper.Context.setup()?

2010-09-17 Thread Owen O'Malley
On Sep 17, 2010, at 7:29 AM, David Rosenstrauch wrote: On 09/16/2010 11:38 PM, Mark Kerzner wrote: Hi, any need for this, protected void setup(Mapper.Context context) throws IOException, InterruptedException { super.setup(context); // TODO - does this need to be done? this.co

Re: do you need to call super in Mapper.Context.setup()?

2010-09-17 Thread Mark Kerzner
In that case, you get an "unchecked type" warning On Fri, Sep 17, 2010 at 10:39 AM, Owen O'Malley wrote: > > On Sep 17, 2010, at 7:29 AM, David Rosenstrauch wrote: > > On 09/16/2010 11:38 PM, Mark Kerzner wrote: >> >>> Hi, >>> >>> any need for this, >>> >>> protected void setup(Mapper.Context c

Re: How to recover a namenode?

2010-09-17 Thread Allen Wittenauer
On Sep 17, 2010, at 1:36 AM, ChingShen wrote: > I think I got it! > > Step by step as following: > > 1. put two files into hdfs. > 2. kill -9 ${namenode_pid}. > 3. delete the whole ${hadoop.tmp.dir}/dfs directory of NN. > 4. copy the dfs directory of BN to NN. > 5. start-up NN. > > It works. I

Re: Number of Mappers Running Simultaneously

2010-09-17 Thread rahul
HI Amogh, Thanks for the Input. Basically when I run a pig job on the Hadoop ad I monitor the job through tracker I always see 4 mappers in the Running section. So, I just wanted to know whether we have any parameter through which we can control the number of mappers in the running section to s

Appending to existing files in HDFS

2010-09-17 Thread Chittaranjan Hota
Hello, I am new to Hadoop and to this forum. Existing setup: Basically we have an existing set up where data is collected from a JMS Q and written on to hard disk without Hadoop. Typcial I/O using log4j. Problem Statement: Now instead of writing it to hard disk, I would like to stream it to

Finding EOF with libhdfs

2010-09-17 Thread Poole, Samuel [USA]
Is there an easy way to find EOF or the length of a file using libhdfs from C++? Also, if anyone has an example of reading a binary file from hdfs until eof, that would be greatly appreciated. thanks, Sam

Tasks Failing : IOException in TaskRunner (Error Code :134)

2010-09-17 Thread C.V.Krishnakumar
Hi all, I am facing a problem with the TaskRunner. I have a small hadoop cluster in the fully distributed mode. However when I submit a job, the job never seems to proceed beyond the "map 0% reduce 0%" stage. Soon after I get this error: java.io.IOException: Task process exit with nonzero stat

Re: Number of Mappers Running Simultaneously

2010-09-17 Thread rahul
Hi Allen, Thank You for your input. Basically I am using Hadoop 0.20.2 along with Pig 0.7.0 so Is it possible in this scenario and also Please let me know exactly can we allocate more memory to each mapper. Thanks, Rahul On Sep 17, 2010, at 8:33 AM, Allen Wittenauer wrote: > > On Sep 16, 201

Re: Appending to existing files in HDFS

2010-09-17 Thread Steve Hoffman
This is a "feature" of HDFS. Files are immutable. You have to create a new file. The file you are writing to isn't available in hdfs until you close it. Usually you'll have something buffering pieces and writing to hdfs. Then you can roll those smaller files into larger chunks using a nightly map

jobtracker: Cannot assign requested address

2010-09-17 Thread Jing Tie
Dear all, I am having this exception when starting jobtracker, and I checked by netstat that the port is not in use before running. Could you please point out where might be the problem? Many thanks in advance! 2010-09-17 17:07:22,863 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindExcept

Re: Appending to existing files in HDFS

2010-09-17 Thread Chittaranjan Hota
Hi Steve, Thanks for the inputs. I could understand by now, that the files are "immutable". Wanted to confirm. However little confused as to what role the "append" methods are? I am now going to explore and see how it works out when I keep a stream open and write data to it and close on an

Re: Appending to existing files in HDFS

2010-09-17 Thread Lance Norskog
Once you close it, the HDFS daemons own the file and make sure it's copied around. Allowing reopens at this point makes that distribution control that much more complex: asynchronous processes have to agree that the old file now is longer. Another thing to keep in mind is that HDFS has blocks