Some basic questions on replication

2010-11-04 Thread Hari Sreekumar
Hi, I have some pretty basic stuff on replication that I am no very clear about, even after reading the online docs.. 1. My understanding is that replication factor of x means any block of data in HDFS will be available, given enough time, at x different nodes. I have a confusion whether it is

Re: Some basic questions on replication

2010-11-04 Thread Harsh J
Hi, The following inline reply is from what I know so far. On Thu, Nov 4, 2010 at 11:24 PM, Hari Sreekumar hsreeku...@clickable.com wrote: Hi,  I have some pretty basic stuff on replication that I am no very clear about, even after reading the online docs.. 1. My understanding is that

Re: Some basic questions on replication

2010-11-04 Thread Hari Sreekumar
Hi Harsh, Thanks for the reply. So if I have a 2048 MB file with 64 MB block size (32 blocks) with replication 3, then I'll have 96 blocks of the file on HDFS, with no two similar blocks being on the same datanode. Also, if I change the dfs.replication property, does it effect files

San Francisco Hadoop meetup

2010-11-04 Thread Aaron Kimball
Hello Hadoop fans, The Bay Area Hadoop User Group meetups are a long hike for those of us who come from San Francisco. I'd like to gauge interest in SF-centric Hadoop gatherings. In contrast to the presentation-based format of the usual HUG meetings, I'm interested in holding events that are

Is there any reliable way for a reducer to determine the number of mappers used in a job

2010-11-04 Thread Steve Lewis
Short of having every mapper increment a counter -- Steven M. Lewis PhD 4221 105th Ave Ne Kirkland, WA 98033 206-384-1340 (cell) Institute for Systems Biology Seattle WA

Re: Some basic questions on replication

2010-11-04 Thread Harsh J
Hello again, On Fri, Nov 5, 2010 at 12:52 AM, Hari Sreekumar hsreeku...@clickable.com wrote: Hi Harsh,             Thanks for the reply. So if I have a 2048 MB file with 64 MB block size (32 blocks) with replication 3, then I'll have 96 blocks of the file on HDFS, with no two similar blocks

Re: Is there any reliable way for a reducer to determine the number of mappers used in a job

2010-11-04 Thread Harsh J
Hello, On Fri, Nov 5, 2010 at 1:58 AM, Steve Lewis lordjoe2...@gmail.com wrote: Short of having every mapper increment a counter Sure, since mapred.map.tasks is computed and set before jobs are 'submitted' for running (before setup itself), it can be read via the Configuration object in the

Re: Two questions.

2010-11-04 Thread Allen Wittenauer
On Nov 3, 2010, at 7:27 PM, David B. Ritch wrote: The parameter mapred.hosts.exclude has existed in documentation for many versions, but I do not believe it has ever been implemented in the actual code. mradmin -refreshNodes was added in 0.21. So code definitely exists for it. That said,