Re: Doubt regarding use of HTTP during Shuffle phase

2012-06-27 Thread Pavan Kulkarni
Oh.Thanks a lot Owen. I'll have a look into it. On Wed, Jun 27, 2012 at 10:21 AM, Owen O'Malley wrote: > Pavan, > This is a very big project. Look at the users of IFile.java. IFile is the > format for storing the shuffle outputs. > > -- Owen > -- --With Regards Pavan Kulkarni

Re: Doubt regarding use of HTTP during Shuffle phase

2012-06-27 Thread Owen O'Malley
Pavan, This is a very big project. Look at the users of IFile.java. IFile is the format for storing the shuffle outputs. -- Owen

Re: Doubt regarding use of HTTP during Shuffle phase

2012-06-27 Thread Pavan Kulkarni
@Roman: Exactly these are the papers I referred before venturing into this project. Thanks a lot for the sailfish paper.Wasn't aware of that. Also by any chance do you have any idea what classes do I need to tweak to get the Filenames of Map Output files.?Thanks On Wed, Jun 27, 2012 at 9:52 AM, Ro

Re: Doubt regarding use of HTTP during Shuffle phase

2012-06-27 Thread Roman Shaposhnik
On Wed, Jun 27, 2012 at 9:44 AM, Pavan Kulkarni wrote: > Yes you are correct, but we can use Lustre FS and it does scale right? > I am new to this so please excuse if I am wrong in some assumptions. You can use hybrid approaches, but you'd be venturing into the unknown. Take a look at these proje

Re: Doubt regarding use of HTTP during Shuffle phase

2012-06-27 Thread Pavan Kulkarni
Yes you are correct, but we can use Lustre FS and it does scale right? I am new to this so please excuse if I am wrong in some assumptions.Thanks On Wed, Jun 27, 2012 at 9:40 AM, Owen O'Malley wrote: > On Wed, Jun 27, 2012 at 9:33 AM, Pavan Kulkarni >wrote: > > Why is HTTP used during the Shuff

Re: Doubt regarding use of HTTP during Shuffle phase

2012-06-27 Thread Owen O'Malley
On Wed, Jun 27, 2012 at 9:33 AM, Pavan Kulkarni wrote: Why is HTTP used during the Shuffle phase > instead of just creating a Hardlink? > The reduces run on a separate machine that the map. Hadoop doesn't use a shared NFS file system between the machines, because it doesn't scale. -- Owen