Hi Rodrigo, Thanks for your suggestion. Though I don't see how the multistore UDF helps.
Register UDFs etc > A = LOAD.... > B = LOAD.... > C = LOAD.... > > -- do lots of transformations with A and B and C get intermediate result > INTER_RES > result1 = FOREACH (GROUP INTER_RES BY (... > STORE result1 INTO '.... > result2 = FOREACH (GROUP INTER_RES BY (... > STORE result2 INTO '.... > result3 = FOREACH (GROUP INTER_RES BY (... > STORE result3 INTO '.... > result4 = FOREACH (GROUP INTER_RES BY (... > STORE result4 INTO '.... > ... > ... > The different projections (groupings) are not done in the intermediate result INTER_RES they are done later... Cheers, -Marco On Thu, Jan 8, 2015 at 12:04 PM, Rodrigo Ferreira <web...@gmail.com> wrote: > Marco, > > check out this UDF: > > http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html > > I think it can get the job done without having to group everything. > > Cheers, > Rodrigo > > 2015-01-08 7:27 GMT-02:00 Marco Cadetg <ma...@zattoo.com>: > > > Hi there, > > > > I've a big pig script which first generates some expensive intermediate > > result on which I run multiple group by statements and multiple stores. > > Something like this. > > > > Register UDFs etc > > A = LOAD.... > > B = LOAD.... > > C = LOAD.... > > > > -- do lots of transformations with A and B and C get intermediate result > > INTER_RES > > result1 = FOREACH (GROUP INTER_RES BY (... > > STORE result1 INTO '.... > > result2 = FOREACH (GROUP INTER_RES BY (... > > STORE result2 INTO '.... > > result3 = FOREACH (GROUP INTER_RES BY (... > > STORE result3 INTO '.... > > result4 = FOREACH (GROUP INTER_RES BY (... > > STORE result4 INTO '.... > > ... > > ... > > > > Note the results which get stored are independent off each other. Meaning > > they are not getting used as an input for anything else further down and > do > > also not alter the INTER_RES. > > > > Am I correct that pig would only need to LOAD A, B and C once? From what > I > > can see on the command line output it looks like the expensive > intermediate > > is computed every time for each store. I've done a quick test and if I > do a > > STORE of the intermediate and LOAD that it seems to be faster. Is there a > > way to avoid this storing of the expensive intermediate? > > > > Cheers, > > -Marco > > >