[
https://issues.apache.org/jira/browse/PIG-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Travis Crawford updated PIG-2573:
-
Status: Patch Available (was: Open)
> Automagically setting parallelism based on input file si
[
https://issues.apache.org/jira/browse/PIG-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Travis Crawford updated PIG-2573:
-
Attachment: PIG-2573_get_size_from_stats_if_possible.diff
Patch has been updated to get the size fr
Yslow does some clever correlation-based optimizations to achieve
significant speedups. They have a good paper about it:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Note the Hive/Pig numbers.. we are generating unnecessary jobs, and
too much intermediate data, it see