Hello! The problem was with the ordering of vzeroupper removal pass and pad-return pass, both in mach pass. Attached patch changes pass ordering so vzeroupper removal is run before pad-return pass. Pad-return pass then (correctly) finds empty function and emits long return.
2011-05-04 Uros Bizjak <ubiz...@gmail.com> * config/i386/i386.c (ix86_reorg): Run move_or_delete_vzeroupper first. Tested on x86_64-pc-linux-gnu {,-m32} AVX target, committed to mainline SVN. Uros.
Index: i386.c =================================================================== --- i386.c (revision 173376) +++ i386.c (working copy) @@ -30444,6 +30444,10 @@ ix86_reorg (void) with old MDEP_REORGS that are not CFG based. Recompute it now. */ compute_bb_for_insn (); + /* Run the vzeroupper optimization if needed. */ + if (TARGET_VZEROUPPER) + move_or_delete_vzeroupper (); + if (optimize && optimize_function_for_speed_p (cfun)) { if (TARGET_PAD_SHORT_FUNCTION) @@ -30455,10 +30459,6 @@ ix86_reorg (void) ix86_avoid_jump_mispredicts (); #endif } - - /* Run the vzeroupper optimization if needed. */ - if (TARGET_VZEROUPPER) - move_or_delete_vzeroupper (); } /* Return nonzero when QImode register that must be represented via REX prefix