When I run map reduce task over a har file as the input, I see that the
input splits refer to 64mb byte boundaries inside the part file.
My mappers only know how to process the contents of each logical file inside
the har file. Is there some way by which I can take the offset range
specified by th
In the documentation for Hadoop Streaming it says that the "-input" option
can be specified multiple times for multiples input directories. The same
does not seem to work with Pipes.
Is there some way to specify multiple input directories for pipes jobs?
Roshan
ps. With muliple input dirs this i
Further, look at the namenode file system browser for your cluster to see
the chunking in action.
http://wiki.apache.org/hadoop/WebApp%20URLs
Roshan
On Thu, Jun 18, 2009 at 6:28 AM, Harish Mallipeddi <
harish.mallipe...@gmail.com> wrote:
> On Thu, Jun 18, 2009 at 3:43 PM, rajeev gupta wrote:
>
> with
> either. I do not know how pipes interacts with either.
>
> On Wed, Jun 17, 2009 at 12:43 PM, Roshan James <
> roshan.james.subscript...@gmail.com> wrote:
>
> > Hello, Is there any way to express dependencies between map-reduce jobs
> > (such as in org.
I did get this working. InputSplit information is not returned clearly. You
may want to look at this thread -
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200906.mbox/%3cee216d470906121602k7f914179u5d9555e7bb080...@mail.gmail.com%3e
On Thu, Jun 18, 2009 at 12:49 AM, Jianmin Woo wrot
Hello, Is there any way to express dependencies between map-reduce jobs
(such as in org.apache.hadoop.mapred.jobcontrol) for pipes jobs? The
provided header Pipes.hh does not seem to reflect any such capabilities.
best,
Roshan
Thanks, it looks like I can write a line reader in C++ that roughly does
what the Java version does. This also means that I can deserialise my own
custom formats as well. Thanks!
Roshan
On Tue, Jun 16, 2009 at 12:22 PM, Owen O'Malley wrote:
> Sorry, I forget how much isn't clear to people who a
for file "hdfs://nyc-qws-029/in-dir/words.txt"
from offset 0 to 181420.
That said, is there some reason why this is the format? I don't want the
deserialiser I write to break from one version of Hadoop to the next.
Roshan
On Tue, Jun 16, 2009 at 9:41 AM, Roshan James <
rosh
Why dont we convert input split information into the same string format that
is displayed in the webUI? Something like this -
"hdfs://nyc-qws-029/in-dir/words86ac4a.txt:0+184185". Its a simple format
and we can always parse such a string in C++.
Is there some reason for the current binary format?
out.
After a couple of quick glances at the the pipes code it looks like the Java
InputSplit object it passed to the C++ wrapper as is, without any explicit
conversion to string.
Since I am new to Hadoop, I am not sure if this is a bug or something I am
doing wrong.
Please advice,
Roshan
On Fri, J
I am working with the wordcount example of Hadoop Pipes (0.20.0). I have a 7
machine cluster.
When I look at MapContext.getInputSplit() in my map function, I see that it
returns the empty string. I was expecting to see a filename and some sort of
range specification of so. I am using the default j
Hi,
I am new to Hadoop and am using Pipes and Hadoop ver 0.20.0. Can someone
tell me where in the web UI we see status messages set by
TaskContext::setStatus and the stderr? Also is stdout captured somehwere?
Thanks in advance,
Roshan
Hi,
I am trying to get started with Hadoop Pipes. Is there an example of
chaining tasks (with Pipes) somewhere?
If not, can someone tell me how I can specify the input and output
directories for the second task. I was expecting to be able to set these
values in JobConf, but Pipes seems to provide
13 matches
Mail list logo