Hi,

I saw some similar questions in this mailing list but could not find a
clear answer yet.
With fairly large dataset (330G), the FPGrowth takes most of time in the
parallel-fpgrowth Reduce tasks, can I set the number of Reduce jobs
automatically? In my default Hadoop installation, the number of reduce job
is one, and it takes very long. I could make them complete earlier by
setting the Hadoop default number of Reduce jobs to over 10. Do you have a
recommendation to set the number of reduce jobs automatically by
considering Group size and number of frequent attributes?

Thanks.

Reply via email to