[jira] [Updated] (PIG-2573) Automagically setting parallelism based on input file size does not work with HCatalog

2012-03-10 Thread Travis Crawford (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Crawford updated PIG-2573: - Status: Patch Available (was: Open) > Automagically setting parallelism based on input file si

[jira] [Updated] (PIG-2573) Automagically setting parallelism based on input file size does not work with HCatalog

2012-03-10 Thread Travis Crawford (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Crawford updated PIG-2573: - Attachment: PIG-2573_get_size_from_stats_if_possible.diff Patch has been updated to get the size fr

yslow optimizations

2012-03-10 Thread Dmitriy Ryaboy
Yslow does some clever correlation-based optimizations to achieve significant speedups. They have a good paper about it: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Note the Hive/Pig numbers.. we are generating unnecessary jobs, and too much intermediate data, it see