setting the number of reduce jobs for FPGrowth

ricky lee Tue, 26 Mar 2013 09:14:33 -0700

Hi,

I saw some similar questions in this mailing list but could not find a
clear answer yet.
With fairly large dataset (330G), the FPGrowth takes most of time in the
parallel-fpgrowth Reduce tasks, can I set the number of Reduce jobs
automatically? In my default Hadoop installation, the number of reduce job
is one, and it takes very long. I could make them complete earlier by
setting the Hadoop default number of Reduce jobs to over 10. Do you have a
recommendation to set the number of reduce jobs automatically by
considering Group size and number of frequent attributes?


Thanks.

setting the number of reduce jobs for FPGrowth

Reply via email to