Hey Guys,

I am trying to optimize my Pig jobs as much as possible and wanted to know a
little about how Pig handles its loading of data.

When I have:

var1 = LOAD ....
local_var1 = FOREACH
local_var1 = JOIN ... [etc]
~~
~~
~~
STORE local_var1 ...
local_var2 = FOREACH local_var2
local_var2 = JOIN ... [etc]
~~
STORE local_var2

am I gaining any performance improvements by not loading a lengthy file
everytime, instead, storing it in a different alias (local_var2 &
local_var1) and manipulating it there, preserving the original (var1), or am
I better having multiple LOADs and manipulating the original alias directly
?

Robert.

Reply via email to