Hey Guys, I am trying to optimize my Pig jobs as much as possible and wanted to know a little about how Pig handles its loading of data.
When I have: var1 = LOAD .... local_var1 = FOREACH local_var1 = JOIN ... [etc] ~~ ~~ ~~ STORE local_var1 ... local_var2 = FOREACH local_var2 local_var2 = JOIN ... [etc] ~~ STORE local_var2 am I gaining any performance improvements by not loading a lengthy file everytime, instead, storing it in a different alias (local_var2 & local_var1) and manipulating it there, preserving the original (var1), or am I better having multiple LOADs and manipulating the original alias directly ? Robert.
