from:"Roshan James"

Doing MapReduce over Har files

2009-06-23 Thread Roshan James

When I run map reduce task over a har file as the input, I see that the input splits refer to 64mb byte boundaries inside the part file. My mappers only know how to process the contents of each logical file inside the har file. Is there some way by which I can take the offset range specified by th

Can a hadoop pipes job be given multiple input directories?

2009-06-18 Thread Roshan James

In the documentation for Hadoop Streaming it says that the "-input" option can be specified multiple times for multiples input directories. The same does not seem to work with Pipes. Is there some way to specify multiple input directories for pipes jobs? Roshan ps. With muliple input dirs this i

Re: Data replication and moving computation

2009-06-18 Thread Roshan James

Further, look at the namenode file system browser for your cluster to see the chunking in action. http://wiki.apache.org/hadoop/WebApp%20URLs Roshan On Thu, Jun 18, 2009 at 6:28 AM, Harish Mallipeddi < harish.mallipe...@gmail.com> wrote: > On Thu, Jun 18, 2009 at 3:43 PM, rajeev gupta wrote: >

Re: JobControl for Pipes?

2009-06-18 Thread Roshan James

> with > either. I do not know how pipes interacts with either. > > On Wed, Jun 17, 2009 at 12:43 PM, Roshan James < > roshan.james.subscript...@gmail.com> wrote: > > > Hello, Is there any way to express dependencies between map-reduce jobs > > (such as in org.

Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

2009-06-18 Thread Roshan James

I did get this working. InputSplit information is not returned clearly. You may want to look at this thread - http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200906.mbox/%3cee216d470906121602k7f914179u5d9555e7bb080...@mail.gmail.com%3e On Thu, Jun 18, 2009 at 12:49 AM, Jianmin Woo wrot

JobControl for Pipes?

2009-06-17 Thread Roshan James

Hello, Is there any way to express dependencies between map-reduce jobs (such as in org.apache.hadoop.mapred.jobcontrol) for pipes jobs? The provided header Pipes.hh does not seem to reflect any such capabilities. best, Roshan

Re: MapContext.getInputSplit() returns nothing

2009-06-17 Thread Roshan James

Thanks, it looks like I can write a line reader in C++ that roughly does what the Java version does. This also means that I can deserialise my own custom formats as well. Thanks! Roshan On Tue, Jun 16, 2009 at 12:22 PM, Owen O'Malley wrote: > Sorry, I forget how much isn't clear to people who a

Re: MapContext.getInputSplit() returns nothing

2009-06-16 Thread Roshan James

for file "hdfs://nyc-qws-029/in-dir/words.txt" from offset 0 to 181420. That said, is there some reason why this is the format? I don't want the deserialiser I write to break from one version of Hadoop to the next. Roshan On Tue, Jun 16, 2009 at 9:41 AM, Roshan James < rosh

Re: MapContext.getInputSplit() returns nothing

2009-06-16 Thread Roshan James

Why dont we convert input split information into the same string format that is displayed in the webUI? Something like this - "hdfs://nyc-qws-029/in-dir/words86ac4a.txt:0+184185". Its a simple format and we can always parse such a string in C++. Is there some reason for the current binary format?

Re: MapContext.getInputSplit() returns nothing

2009-06-15 Thread Roshan James

out. After a couple of quick glances at the the pipes code it looks like the Java InputSplit object it passed to the C++ wrapper as is, without any explicit conversion to string. Since I am new to Hadoop, I am not sure if this is a bug or something I am doing wrong. Please advice, Roshan On Fri, J

MapContext.getInputSplit() returns nothing

2009-06-12 Thread Roshan James

I am working with the wordcount example of Hadoop Pipes (0.20.0). I have a 7 machine cluster. When I look at MapContext.getInputSplit() in my map function, I see that it returns the empty string. I was expecting to see a filename and some sort of range specification of so. I am using the default j

Where in the WebUI do we see setStatus and stderr output?

2009-06-10 Thread Roshan James

Hi, I am new to Hadoop and am using Pipes and Hadoop ver 0.20.0. Can someone tell me where in the web UI we see status messages set by TaskContext::setStatus and the stderr? Also is stdout captured somehwere? Thanks in advance, Roshan

Chaining Pipes Tasks

2009-06-08 Thread Roshan James

Hi, I am trying to get started with Hadoop Pipes. Is there an example of chaining tasks (with Pipes) somewhere? If not, can someone tell me how I can specify the input and output directories for the second task. I was expecting to be able to set these values in JobConf, but Pipes seems to provide

Doing MapReduce over Har files

Can a hadoop pipes job be given multiple input directories?

Re: Data replication and moving computation

Re: JobControl for Pipes?

Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

JobControl for Pipes?

Re: MapContext.getInputSplit() returns nothing

Re: MapContext.getInputSplit() returns nothing

Re: MapContext.getInputSplit() returns nothing

Re: MapContext.getInputSplit() returns nothing

MapContext.getInputSplit() returns nothing

Where in the WebUI do we see setStatus and stderr output?

Chaining Pipes Tasks

13 matches

Site Navigation

Mail list logo

Footer information