Re: Quick question

2011-02-21 Thread maha
How can then I produce an output/file per mapper not map-task? Thank you, Maha On Feb 20, 2011, at 10:22 PM, Ted Dunning wrote: > This is the most important thing that you have said. The map function > is called once per unit of input but the mapper object persists for > many input units of inpu

Re: Quick question

2011-02-21 Thread maha
ineInputFormat is already doing for you. > > -Original Message- > From: maha [mailto:m...@umail.ucsb.edu] > Sent: Sunday, February 20, 2011 2:00 PM > To: common-user@hadoop.apache.org > Subject: Re: Quick question > > Actually the following solved my problem ...

RE: Quick question

2011-02-21 Thread Jim Falgout
Sent: Sunday, February 20, 2011 2:00 PM To: common-user@hadoop.apache.org Subject: Re: Quick question Actually the following solved my problem ... but I'm a little suspicious of the side effect of doing the following instead of using my own InputSplit to be 5 lines. conf.setI

Re: Quick question

2011-02-20 Thread Ted Dunning
This is the most important thing that you have said. The map function is called once per unit of input but the mapper object persists for many input units of input. You have a little bit of control over how many mapper objects there are and how many machines they are created on and how many pieces

Re: Quick question

2011-02-20 Thread maha
TextInputFormat handles situations where records cross >>> split boundaries. What your mapper will see is "whole" records. >>> >>> -Original Message- >>> From: maha [mailto:m...@umail.ucsb.edu] >>> Sent: Friday, February 18, 2011 1:14

Re: Quick question

2011-02-20 Thread maha
>> From: maha [mailto:m...@umail.ucsb.edu] >> Sent: Friday, February 18, 2011 1:14 PM >> To: common-user >> Subject: Quick question >> >> Hi all, >> >> I want to check if the following statement is right: >> >> If I use TextInputFormat t

Re: Quick question

2011-02-20 Thread maha
: > That's right. The TextInputFormat handles situations where records cross > split boundaries. What your mapper will see is "whole" records. > > -Original Message- > From: maha [mailto:m...@umail.ucsb.edu] > Sent: Friday, February 18, 2011 1:14 PM > To: c

Re: Quick question

2011-02-18 Thread maha
ha [mailto:m...@umail.ucsb.edu] > Sent: Friday, February 18, 2011 1:14 PM > To: common-user > Subject: Quick question > > Hi all, > > I want to check if the following statement is right: > > If I use TextInputFormat to process a text file with 2000 lines (each endi

RE: Quick question

2011-02-18 Thread Jim Falgout
That's right. The TextInputFormat handles situations where records cross split boundaries. What your mapper will see is "whole" records. -Original Message- From: maha [mailto:m...@umail.ucsb.edu] Sent: Friday, February 18, 2011 1:14 PM To: common-user Subject: Quick q

Re: Quick question

2011-02-18 Thread Ted Dunning
The input is effectively split by lines, but under the covers, the actual splits are by byte. Each mapper will cleverly scan from the specified start to the next line after the start point. At then end, it will over-read to the end of line that is at or after the end of its specified region. Thi

Quick question

2011-02-18 Thread maha
Hi all, I want to check if the following statement is right: If I use TextInputFormat to process a text file with 2000 lines (each ending with \n) with 20 mappers. Then each map will have a sequence of COMPLETE LINES . In other words, the input is not split byte-wise but by lines. Is th

Re: Quick Question: LineSplit or BlockSplit

2011-02-07 Thread maha
Thanks Ted. Then I have to write my own InputFormat to read a block-of-lines per mapper. NLineInputFormat didn't work with me, any working example about it is appreciate it. Thanks again, Maha On Feb 7, 2011, at 6:32 PM, Mark Kerzner wrote: > Thanks! > Mark > > On Mon, Feb 7, 2011 at

Re: Quick Question: LineSplit or BlockSplit

2011-02-07 Thread Mark Kerzner
Thanks! Mark On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning wrote: > That is quite doable. One way to do it is to make the max split size quite > small. > > On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner > wrote: > > > Ted, > > > > I am also interested in this answer. > > > > I put the name of a zi

Re: Quick Question: LineSplit or BlockSplit

2011-02-07 Thread Ted Dunning
That is quite doable. One way to do it is to make the max split size quite small. On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner wrote: > Ted, > > I am also interested in this answer. > > I put the name of a zip file on a line in an input file, and I want one > mapper to read this line, and start

Re: Quick Question: LineSplit or BlockSplit

2011-02-07 Thread Mark Kerzner
Ted, I am also interested in this answer. I put the name of a zip file on a line in an input file, and I want one mapper to read this line, and start working on it (since it now knows the path in HDFS). Are you saying it's not doable? Thank you, Mark On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning

Re: Quick Question: LineSplit or BlockSplit

2011-02-07 Thread Ted Dunning
Option (1) isn't the way that things normally work. Besides, mappers are called many times for each construction of a mapper. On Mon, Feb 7, 2011 at 3:38 PM, maha wrote: > Hi, > > I would appreciate it if you could give me your thoughts if there is > affect on efficiency if: > > 1) Mappers we

Quick Question: LineSplit or BlockSplit

2011-02-07 Thread maha
Hi, I would appreciate it if you could give me your thoughts if there is affect on efficiency if: 1) Mappers were per line in a document or 2) Mappers were per block of lines in a document. I know the obvious difference I can see is that (1) has more mappers. Does that mean (1) w

Re: another quick question

2010-10-06 Thread Maha A. Alabduljalil
Well I went to check. Now, I'm using the school machine and the UI from quick start worked fine ie. hdfs://localhost:50070. looking at my fileSystem the temporary directory created has system file in it. is that why? /cs/student/maha/tmp/mapred/system I'm able to use the "localhost"

Re: another quick question

2010-10-06 Thread Jeff Zhang
Hi Maha , I don't think the hadoop.tmp.dir relates the problem of web ui. The web ui is bind to 0.0.0.0:50070 And localhost is mapped to 127.0.0.1 of your home machine in your client side. On Thu, Oct 7, 2010 at 4:22 AM, Maha A. Alabduljalil wrote: > Sorry I'm confused. The story is: > >  I ssh

Re: another quick question

2010-10-06 Thread Maha A. Alabduljalil
Sorry I'm confused. The story is: I ssh into my school-account using my home-computer and installed hadoop in school directory. I used to open the browser of Hadoop-Quick-Start (hdfs://localhost:50070) from my home computer and it showed me the file-system. Yesterday, however, I only wrot

Re: another quick question

2010-10-06 Thread Asif Jan
Hi The tmp directory is local to the machine running the hadoop system, so if your hadoop is on a remote machine, tmp directory has to be on that machine Your question is not clear to me e.g. what you want to do? asif On Oct 6, 2010, at 9:55 PM, Maha A. Alabduljalil wrote: Hi again,

another quick question

2010-10-06 Thread Maha A. Alabduljalil
Hi again, I guess my questions are easy.. Since I'm installing hadoop in my school machine I have to veiw namenode online via hdfs://host-name:50070 instead of the default link provided by Hadoop Quick Start. (ie.hdfs://localhost:50070). Do you think I should set my hadoop.tmp.dir to t

Re: Quick question

2010-10-06 Thread Maha A. Alabduljalil
Thanks Asif it worked ! :) Maha Quoting Asif Jan : Hi check if the ports are open outside school network else you will have to use ssh tunneling if you want to access ports serving the webpages (as it is more likely that these are not open by default) try something like ssh -L50030:h

Re: Quick question

2010-10-06 Thread Asif Jan
Hi check if the ports are open outside school network else you will have to use ssh tunneling if you want to access ports serving the webpages (as it is more likely that these are not open by default) try something like ssh -L50030:hadoop-host-address:50030 ur-usern...@cluster-head-node T

Quick question

2010-10-06 Thread Maha A. Alabduljalil
Hi Every one, I've started up hadoop (hdfs data and name nodes, JobTracker and TaskTrakers), using the quick start guidance. The web view of the filesystem and jobtracker suddenly started to give can't be found by safari. Notice I'm actually accessing hadoop via ssh to my school accoun

Re: quick question about Pipes CLI

2009-12-09 Thread Prakhar Sharma
Thanks Philip, that worked. Regards, Prakhar On Thu, Dec 10, 2009 at 12:25 AM, Philip Zeyliger wrote: > I believe "class" would be something like > "org.apache.hadoop.mapred.TextInputFormat" or whatever.  I haven't had a > chance to try it to make sure, however. > > -- Philip > > On Wed, Dec 9,

Re: quick question about Pipes CLI

2009-12-09 Thread Philip Zeyliger
I believe "class" would be something like "org.apache.hadoop.mapred.TextInputFormat" or whatever. I haven't had a chance to try it to make sure, however. -- Philip On Wed, Dec 9, 2009 at 9:15 PM, Prakhar Sharma wrote: > Hi all, > In the Pipes CLI: > bin/hadoop pipes \ > [-inputformat class] \

quick question about Pipes CLI

2009-12-09 Thread Prakhar Sharma
Hi all, In the Pipes CLI: bin/hadoop pipes \ [-inputformat class] \ [-map class] \ [-partitioner class] \ [-reduce class] \ [-writer class] \ do class in "-inputFormat class" means a java class i.e., path to a .class file? (I am bit of a novice in Java) Thanks, Prakhar