Dmitriy, that's the same sort of thing I am talking about, thank you for your reply !
Robert. On 6 February 2011 21:02, Dmitriy Ryaboy <[email protected]> wrote: > Robert, > It is not clear from your code snippets what the relationships are > between the various "var" relations. Could you provide more detail? > > It sort of sounds like you are asking about Pig's multiquery > optimization. You can read about it in these pages: > http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Multi-Query+Execution > http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification > > > > On Sun, Feb 6, 2011 at 12:11 PM, Robert Waddell > <[email protected]> wrote: > > Hey Guys, > > > > I am trying to optimize my Pig jobs as much as possible and wanted to > know a > > little about how Pig handles its loading of data. > > > > When I have: > > > > var1 = LOAD .... > > local_var1 = FOREACH > > local_var1 = JOIN ... [etc] > > ~~ > > ~~ > > ~~ > > STORE local_var1 ... > > local_var2 = FOREACH local_var2 > > local_var2 = JOIN ... [etc] > > ~~ > > STORE local_var2 > > > > am I gaining any performance improvements by not loading a lengthy file > > everytime, instead, storing it in a different alias (local_var2 & > > local_var1) and manipulating it there, preserving the original (var1), or > am > > I better having multiple LOADs and manipulating the original alias > directly > > ? > > > > Robert. > > >
