How can then I produce an output/file per mapper not map-task?
Thank you,
Maha
On Feb 20, 2011, at 10:22 PM, Ted Dunning wrote:
> This is the most important thing that you have said. The map function
> is called once per unit of input but the mapper object persists for
> many input units of inpu
ineInputFormat is already doing for you.
>
> -Original Message-
> From: maha [mailto:m...@umail.ucsb.edu]
> Sent: Sunday, February 20, 2011 2:00 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Quick question
>
> Actually the following solved my problem ...
Sent: Sunday, February 20, 2011 2:00 PM
To: common-user@hadoop.apache.org
Subject: Re: Quick question
Actually the following solved my problem ... but I'm a little suspicious of the
side effect of doing the following instead of using my own InputSplit to be 5
lines.
conf.setI
This is the most important thing that you have said. The map function
is called once per unit of input but the mapper object persists for
many input units of input.
You have a little bit of control over how many mapper objects there
are and how many machines they are created on and how many pieces
TextInputFormat handles situations where records cross
>>> split boundaries. What your mapper will see is "whole" records.
>>>
>>> -Original Message-
>>> From: maha [mailto:m...@umail.ucsb.edu]
>>> Sent: Friday, February 18, 2011 1:14
>> From: maha [mailto:m...@umail.ucsb.edu]
>> Sent: Friday, February 18, 2011 1:14 PM
>> To: common-user
>> Subject: Quick question
>>
>> Hi all,
>>
>> I want to check if the following statement is right:
>>
>> If I use TextInputFormat t
:
> That's right. The TextInputFormat handles situations where records cross
> split boundaries. What your mapper will see is "whole" records.
>
> -Original Message-
> From: maha [mailto:m...@umail.ucsb.edu]
> Sent: Friday, February 18, 2011 1:14 PM
> To: c
ha [mailto:m...@umail.ucsb.edu]
> Sent: Friday, February 18, 2011 1:14 PM
> To: common-user
> Subject: Quick question
>
> Hi all,
>
> I want to check if the following statement is right:
>
> If I use TextInputFormat to process a text file with 2000 lines (each endi
That's right. The TextInputFormat handles situations where records cross split
boundaries. What your mapper will see is "whole" records.
-Original Message-
From: maha [mailto:m...@umail.ucsb.edu]
Sent: Friday, February 18, 2011 1:14 PM
To: common-user
Subject: Quick q
The input is effectively split by lines, but under the covers, the actual
splits are by byte. Each mapper will cleverly scan from the specified start
to the next line after the start point. At then end, it will over-read to
the end of line that is at or after the end of its specified region. Thi
Hi all,
I want to check if the following statement is right:
If I use TextInputFormat to process a text file with 2000 lines (each ending
with \n) with 20 mappers. Then each map will have a sequence of COMPLETE LINES
.
In other words, the input is not split byte-wise but by lines.
Is th
Thanks Ted. Then I have to write my own InputFormat to read a block-of-lines
per mapper.
NLineInputFormat didn't work with me, any working example about it is
appreciate it.
Thanks again,
Maha
On Feb 7, 2011, at 6:32 PM, Mark Kerzner wrote:
> Thanks!
> Mark
>
> On Mon, Feb 7, 2011 at
Thanks!
Mark
On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning wrote:
> That is quite doable. One way to do it is to make the max split size quite
> small.
>
> On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner
> wrote:
>
> > Ted,
> >
> > I am also interested in this answer.
> >
> > I put the name of a zi
That is quite doable. One way to do it is to make the max split size quite
small.
On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner wrote:
> Ted,
>
> I am also interested in this answer.
>
> I put the name of a zip file on a line in an input file, and I want one
> mapper to read this line, and start
Ted,
I am also interested in this answer.
I put the name of a zip file on a line in an input file, and I want one
mapper to read this line, and start working on it (since it now knows the
path in HDFS). Are you saying it's not doable?
Thank you,
Mark
On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning
Option (1) isn't the way that things normally work. Besides, mappers are
called many times for each construction of a mapper.
On Mon, Feb 7, 2011 at 3:38 PM, maha wrote:
> Hi,
>
> I would appreciate it if you could give me your thoughts if there is
> affect on efficiency if:
>
> 1) Mappers we
Hi,
I would appreciate it if you could give me your thoughts if there is affect
on efficiency if:
1) Mappers were per line in a document
or
2) Mappers were per block of lines in a document.
I know the obvious difference I can see is that (1) has more mappers. Does
that mean (1) w
Well I went to check. Now, I'm using the school machine and the UI
from quick start worked fine ie. hdfs://localhost:50070. looking at my
fileSystem the temporary directory created has system file in it. is
that why?
/cs/student/maha/tmp/mapred/system
I'm able to use the "localhost"
Hi Maha ,
I don't think the hadoop.tmp.dir relates the problem of web ui. The
web ui is bind to 0.0.0.0:50070
And localhost is mapped to 127.0.0.1 of your home machine in your client side.
On Thu, Oct 7, 2010 at 4:22 AM, Maha A. Alabduljalil
wrote:
> Sorry I'm confused. The story is:
>
> I ssh
Sorry I'm confused. The story is:
I ssh into my school-account using my home-computer and installed
hadoop in school directory. I used to open the browser of
Hadoop-Quick-Start (hdfs://localhost:50070) from my home computer and
it showed me the file-system. Yesterday, however, I only wrot
Hi
The tmp directory is local to the machine running the hadoop system,
so if your hadoop is on a remote machine, tmp directory has to be on
that machine
Your question is not clear to me e.g. what you want to do?
asif
On Oct 6, 2010, at 9:55 PM, Maha A. Alabduljalil wrote:
Hi again,
Hi again,
I guess my questions are easy..
Since I'm installing hadoop in my school machine I have to veiw
namenode online via hdfs://host-name:50070 instead of the default link
provided by Hadoop Quick Start. (ie.hdfs://localhost:50070).
Do you think I should set my hadoop.tmp.dir to t
Thanks Asif it worked ! :)
Maha
Quoting Asif Jan :
Hi
check if the ports are open outside school network else
you will have to use ssh tunneling if you want to access ports
serving the webpages (as it is more likely that these are not open
by default)
try something like
ssh -L50030:h
Hi
check if the ports are open outside school network else
you will have to use ssh tunneling if you want to access ports serving
the webpages (as it is more likely that these are not open by default)
try something like
ssh -L50030:hadoop-host-address:50030 ur-usern...@cluster-head-node
T
Hi Every one,
I've started up hadoop (hdfs data and name nodes, JobTracker and
TaskTrakers), using the quick start guidance. The web view of the
filesystem and jobtracker suddenly started to give can't be found by
safari.
Notice I'm actually accessing hadoop via ssh to my school accoun
Thanks Philip, that worked.
Regards,
Prakhar
On Thu, Dec 10, 2009 at 12:25 AM, Philip Zeyliger wrote:
> I believe "class" would be something like
> "org.apache.hadoop.mapred.TextInputFormat" or whatever. I haven't had a
> chance to try it to make sure, however.
>
> -- Philip
>
> On Wed, Dec 9,
I believe "class" would be something like
"org.apache.hadoop.mapred.TextInputFormat" or whatever. I haven't had a
chance to try it to make sure, however.
-- Philip
On Wed, Dec 9, 2009 at 9:15 PM, Prakhar Sharma wrote:
> Hi all,
> In the Pipes CLI:
> bin/hadoop pipes \
> [-inputformat class] \
Hi all,
In the Pipes CLI:
bin/hadoop pipes \
[-inputformat class] \
[-map class] \
[-partitioner class] \
[-reduce class] \
[-writer class] \
do class in "-inputFormat class" means a java class i.e., path to a .class file?
(I am bit of a novice in Java)
Thanks,
Prakhar
28 matches
Mail list logo