Re: Can I share datas for several map tasks?

2009-06-15 Thread Sharad Agarwal
snowloong wrote: Hi, I want to share some data structures for the map tasks on a same node(not through files), I mean, if one map task has already initialized some data structures (e.g. an array or a list), can other map tasks share these memorys and directly access them, for I don't want

Re: problem getting map input filename

2009-06-03 Thread Sharad Agarwal
conf.get(map.input.file) should work. If not, then it is a bug in new mapreduce api in 0.20 - Sharad

Re: streaming a binary processing file

2009-06-03 Thread Sharad Agarwal
Binary support has been added for 0.21. One option is to wait for 0.21 to get released, or you might try applying the patch from HADOOP-1722. - Sharad

Re: question about when shuffle/sort start working

2009-06-03 Thread Sharad Agarwal
Jianmin Woo wrote: Do you have some sample on the re-usage of static variables? You can define static variables in your Mapper/Reducer class. Static variables will survive till the jvm is live. So multiple tasks of same job running in a single jvm would able to share those. - Sharad

Re: Selective output based on keys

2009-05-13 Thread sharad agarwal
setOutputKeyClass let you specify the output key type. For outputting selective keys, you need to call OutputCollector#collect for only those keys. If using new map reduce API, need to call Context#write. - Sharad Asim wrote: Hi, I wish to output only selective records to the output files

Re: move tasks to another machine on the fly

2009-05-07 Thread Sharad Agarwal
Just one more question, does Hadoop handles reassign of task failure to different machines in some way? Yes. If task fails then it is retried, preferably on a different machine. I saw that sometimes, usually at the end, when there are more processing units available than map() tasks to

Re: PIG and Hive

2009-05-06 Thread Sharad Agarwal
see core-user mail thread with subject HBase, Hive, Pig and other Hadoop based technologies - Sharad Ricky Ho wrote: Are they competing technologies of providing a higher level language for Map/Reduce programming ? Or are they complementary ? Any comparison between them ? Rgds,

Re: multi-line records and file splits

2009-05-06 Thread Sharad Agarwal
The split doesn't need to be at the record boundary. If a mapper gets a partial record, it will seek to another split to get the full record. - Sharad

Re: Sequence of Streaming Jobs

2009-05-04 Thread Sharad Agarwal
see http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html - Sharad Dan Milstein wrote: If I've got a sequence of streaming jobs, each of which depends on the output of the previous one, is there a good way to launch that sequence? Meaning, I

Re: Implementing compareTo in user-written keys where one extends the other is error prone

2009-05-04 Thread Sharad Agarwal
Marshall Schor wrote: public class Super implements WritableComparableSuper { . . . public int compareTo(Super o) { // sort on string value . . . } I implemented the 2nd key class (let's call it Sub) public class Sub extends Super { . . . public int compareTo(Sub o) {

Re: Changing output file format and name

2009-05-04 Thread Sharad Agarwal
See MultipleOutputFormat.You may require to implement your custom OutputFormat. - Sharad

Re: specifying command line args, but getting an NPE

2009-05-04 Thread Sharad Agarwal
But if conf.set(...) is called after instantiating job, it doesn't. Is this intended? yes, Configuration must be set up before instantiating the Job object. However, some job parameters can be changed (before the actual job submission) by calling set methods on Job object. - Sharad

Re: Appropriate for Hadoop?

2009-04-29 Thread Sharad Agarwal
Adam Retter wrote: So I don't have to use HDFS at all when using Hadoop? The input URI list has to be stored in HDFS. Each mapper will work on a sublist of URIs depending on the no of maps set in job. - Sharad

Re: Appropriate for Hadoop?

2009-04-28 Thread Sharad Agarwal
Each document processing is independent and can be processed parallelly, so that part could be done in a map reduce job. Now whether it suits this use case depends on rate at which new URI's are discovered for processing and acceptable delay in processing of a document. The way I see it you can

Re: Question about the classpath setting for bin/hadoop jar

2009-04-17 Thread Sharad Agarwal
I noticed that the bin/hadoop jar command doesn't add the jar being executed to the classpath. Is this deliberate and what is the reasoning? The result is that resources in the jar are not accessible from the system class loader. Rather they are only available from the thread context class

Re: Sometimes no map tasks are run - X are complete and N-X are pending, none running

2009-04-17 Thread Sharad Agarwal
The last map task is forrever in the pending queue - is this is issue my setup/config or do others have the problem? Do you mean the left over maps are not at all scheduled ? What do you see in jobtracker logs ?

Re: Modeling WordCount in a different way

2009-04-13 Thread sharad agarwal
Pankil Doshi wrote: Hey Did u find any class or way out for storing results of Job1 map/reduce in memory and using that as an input to job2 map/Reduce?I am facing a situation where I need to do similar thing.If anyone can help me out.. Normally you would write the job output to a file and

Re: java compile time warning while using MultipleOutputs

2009-04-13 Thread sharad agarwal
warning: [unchecked] unchecked call to collect(K,V) as a member of the raw type org.apache.hadoop.mapred.OutputCollector Yes, I can live with this warning, but it really makes me uneasy. Any suggestions to remove this warning? You can suppress the warning using annotation in your code:

Re: Polymorphic behavior of Maps in One Job?

2009-04-09 Thread Sharad Agarwal
MultipleInputs.addInputPath(JobConf conf, Path path, Class? extends InputFormat inputFormatClass, Class? extends Mapper mapperClass) to add the mappers and my I/P format. Right, and then you can use DelegatingInputFormat and DelegatingMapper. And use MultipleOutputs class to configure the

Re: Print execution time

2009-04-08 Thread Sharad Agarwal
Also available on jobtracker web ui. Farhan Husain wrote: The logs might help. On Tue, Apr 7, 2009 at 7:28 PM, Mithila Nagendra mnage...@asu.edu wrote: Hey all! Is there a way to print out the execution time of a map reduce task? An inbuilt function or option to be used with bin/hadoop

Re: Modeling WordCount in a different way

2009-04-07 Thread Sharad Agarwal
Suppose a batch of inputsplits arrive in the beginning to every map, and reduce gives the word, frequency for this batch of inputsplits. Now after this another batch of inputsplits arrive and the results from subsequent reduce are aggregated to the previous results(if the word that has

RE: Modeling WordCount in a different way

2009-04-07 Thread Sharad Agarwal
I have confusion how would I start the next job after finishing the one, could you just make it clear by some rough example. See JobControl class to chain the jobs. You can specify dependencies as well. You can checkout the TestJobControl class for example code. Also do I need to use

Re: Hadoop Internal Architecture writeup

2008-11-27 Thread Sharad Agarwal
Just glanced into it. Haven't read in detail. One correction about Secondary namenode - It is not a Hot Standby. see http://wiki.apache.org/hadoop/FAQ#7 Ricky Ho wrote: I put together an article describing the internal architecture of Hadoop (HDFS, MapRed). I'd love to get some feedback if

Re: Getting Reduce Output Bytes

2008-11-25 Thread Sharad Agarwal
Is there an easy way to get Reduce Output Bytes? Reduce Output bytes not available directly but perhaps can be inferred from File system Read/Write bytes counters.

Re: FileSystem.append and FSDataOutputStream.seek

2008-11-19 Thread Sharad Agarwal
Wasim Bari wrote: Hello, Does anyone know when Hadoop team has plan to Implement FileSystem.append(Path) functionality and Something seekable with FSDataOutputStream (mean seek capability) ? FileSystem.append(Path) is already implemented and slated to be released in 0.19 see

Re: Writing to multiple output channels

2008-11-13 Thread Sharad Agarwal
Sunil Jagadish wrote: Hi, I have a mapper which needs to write output into two different kinds of files (output.collect()). check MultipleOutputFormat. That may help.

Re: Best way to handle namespace host failures

2008-11-10 Thread Sharad Agarwal
Goel, Ankur wrote: Hi Folks, I am looking for some advice on some the ways / techniques that people are using to get around namenode failures (Both disk and host). We have a small cluster with several job scheduled for periodic execution on the same host where name server runs.

Re: Customized InputFormat Problem

2008-11-10 Thread Sharad Agarwal
But when I run , It will throw the exception in DbRecordReader.next() method, Although I have Logged in it, I can't still see anything, and don't know where I shoud to check, who can help me where I can get the real excution status, so I can where the error is ! Thansks! Check the logs