Hi, I saw some similar questions in this mailing list but could not find a clear answer yet. With fairly large dataset (330G), the FPGrowth takes most of time in the parallel-fpgrowth Reduce tasks, can I set the number of Reduce jobs automatically? In my default Hadoop installation, the number of reduce job is one, and it takes very long. I could make them complete earlier by setting the Hadoop default number of Reduce jobs to over 10. Do you have a recommendation to set the number of reduce jobs automatically by considering Group size and number of frequent attributes?
Thanks.