It's worth pointing out that Pig 0.9.2 also runs quickly; we only see the
degradation with Pig 0.10.0.
The degradation in performance seems to have a knee as 4 or 5 conditionals
works as expected but as presented, the script takes about 6 minutes at
the GRUNT> prompt after hitting enter; before any Hadoop execution.
-Clay
On Tue, 26 Jun 2012, Danfeng Li wrote:
We found the following simple logic will cause very long compiling time for pig
0.10.0, while using pig 0.8.1,
everything is fine.
A = load 'A.txt' using PigStorage() AS (m: int);
B = FOREACH A {
days_str = (chararray)
(m == 1 ? 31:
(m == 2 ? 28:
(m == 3 ? 31:
(m == 4 ? 30:
(m == 5 ? 31:
(m == 6 ? 30:
(m == 7 ? 31:
(m == 8 ? 31:
(m == 9 ? 30:
(m == 10 ? 31:
(m == 11 ? 30:31)))))))))));
GENERATE
days_str as days_str;
}
store B into 'B';
here’s the pig version we used in the test
Apache Pig version 0.10.0-SNAPSHOT (rexported)
Attached is the pig code and an example input file.
Dan