Re: concurrency

2012-10-12 Thread Harsh J
Joep, You're right - I missed in my quick scan that he was actually replacing those files there. Sorry for the confusion Koert! On Fri, Oct 12, 2012 at 9:37 PM, J. Rottinghuis wrote: > Hi Harsh, Moge Koert, > > If Koerts problem is similar to what I have been thinking about where we > want to co

Re: distcp question

2012-10-12 Thread J. Rottinghuis
Rita, Are you doing a push from the source cluster or a pull from the target cluster? Doing a pull with distcp using hftp (to accomodate for version differences) has the advantage of slightly fewer transfers of blocks over the TORs. Each block is read from exactly the datanode where it is located

Re: concurrency

2012-10-12 Thread Koert Kuipers
Hey Harsh & Joep, My main worry was actually the simpler situation in which only new subdirs are created by loaders. If we for a second focus on this "append-only" situation, which i admit is only a subset of all cases, even then it is not entirely clear to me how to go about this. Right now i pas

Re: distcp question

2012-10-12 Thread Rita
thanks for the advise. Before I push or pull. Are there any tests I can run before I do the distCP. I am not 100% sure if I have my webhdfs setup properly. On Fri, Oct 12, 2012 at 1:01 PM, J. Rottinghuis wrote: > Rita, > > Are you doing a push from the source cluster or a pull from the target

Re: Re: distcp question

2012-10-12 Thread kojie . fu
kojie.fu From: Rita Date: 2012-10-13 03:19 To: common-user Subject: Re: distcp question thanks for the advise. Before I push or pull. Are there any tests I can run before I do the distCP. I am not 100% sure if I have my webhdfs setup properly. On Fri, Oct 12, 2012 at 1:01 PM, J. Rottingh

Re: Re: distcp question

2012-10-12 Thread Rita
nvermind. Figured it out. On Fri, Oct 12, 2012 at 3:20 PM, kojie.fu wrote: > > > > > > kojie.fu > > From: Rita > Date: 2012-10-13 03:19 > To: common-user > Subject: Re: distcp question > thanks for the advise. > > Before I push or pull. Are there any tests I can run before I do the > distCP. I

Re: speculative execution before mappers finish

2012-10-12 Thread Harsh J
Think of it in partition terms. If you know that your map-splits X, Y and Z won't emit any key of partition P, then the Pth reducer can jump ahead and run without those X, Y and Z completing their processing. Otherwise, a reducer can't run until all maps have completed, in fear of losing a few key