Hi everyone, happy holidays! I have a Pig script that reads from 4 different folders in Amazon S3. This is the code:
load_1 = LOAD 's3n://mybucket/{folder_1,folder_2,folder_3,folder_4}'
USING...;
It happens that instead of reading each folder just once and appending the
files Pig/Hadoop reads each folder 4 times.
The input should have 62174 records, but in the end I get 248696.
Why is that? Any ideas?
Thanks,
Rodrigo.
