Could you try with the Performance patch on the JIRA issue page 619
On Tue, Apr 5, 2011 at 1:14 AM, Vipul Pandey <[email protected]> wrote: > So I have a new problem now. > I have about 3.5M baskets and about a 100K items/features. After > successfully completing parallel-counting, grouping and transaction sorting > in a reasonable amount of time - PFP Growth just gets stuck in the Reduce > phase of ParallelFPGrowth. > The reducer is running since about 6:30 PM PST Friday, 1st of April - (it's > beyond noon of 4th April here - i.e. running for more than 65 hours now) > and > it's still going on. > > Anyone faced this issue before? (Console logs pasted below) > Also, > - We don't have anything else running on our cluster and it's a reasonably > powerful box. > - the jobtracker UI shows this status for the reducer : "Processing FPTree: > FPGrowth Algorithm for a given feature: 49797 > reduce" and the feature > keeps changing every minute or two- which shows that it's still doing it's > job! > - no other errors/messages in the logs > - config : -s 100 -k 50 -g 2 - Any Recommendations on these? my usual > values are "-s 10 -g 10" but that doesn't seem to cut for me either. > > > > Any clue? > > Thanks > Vipul > > > Log : > > 11/04/01 18:20:07 INFO common.HadoopUtil: Deleting > /run/MR20/fim/parallelcounting > > 11/04/01 18:20:07 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > > 11/04/01 18:20:08 INFO input.FileInputFormat: Total input paths to process > : > 20 > > 11/04/01 18:20:08 INFO mapred.JobClient: Running job: job_201103311244_0045 > > 11/04/01 18:20:09 INFO mapred.JobClient: map 0% reduce 0% > > 11/04/01 18:20:20 INFO mapred.JobClient: map 2% reduce 0% > > 11/04/01 18:20:21 INFO mapred.JobClient: map 8% reduce 0% > > 11/04/01 18:20:22 INFO mapred.JobClient: map 11% reduce 0% > > 11/04/01 18:20:23 INFO mapred.JobClient: map 13% reduce 0% > > 11/04/01 18:20:24 INFO mapred.JobClient: map 16% reduce 0% > > 11/04/01 18:20:25 INFO mapred.JobClient: map 17% reduce 0% > > 11/04/01 18:20:26 INFO mapred.JobClient: map 18% reduce 0% > > 11/04/01 18:20:27 INFO mapred.JobClient: map 21% reduce 0% > > 11/04/01 18:20:28 INFO mapred.JobClient: map 22% reduce 0% > > 11/04/01 18:20:29 INFO mapred.JobClient: map 24% reduce 0% > > 11/04/01 18:20:30 INFO mapred.JobClient: map 27% reduce 0% > > 11/04/01 18:20:31 INFO mapred.JobClient: map 28% reduce 0% > > 11/04/01 18:20:32 INFO mapred.JobClient: map 31% reduce 0% > > 11/04/01 18:20:34 INFO mapred.JobClient: map 33% reduce 0% > > 11/04/01 18:20:35 INFO mapred.JobClient: map 35% reduce 0% > > 11/04/01 18:20:36 INFO mapred.JobClient: map 38% reduce 0% > > 11/04/01 18:20:37 INFO mapred.JobClient: map 40% reduce 0% > > 11/04/01 18:20:38 INFO mapred.JobClient: map 42% reduce 0% > > 11/04/01 18:20:39 INFO mapred.JobClient: map 43% reduce 0% > > 11/04/01 18:20:40 INFO mapred.JobClient: map 45% reduce 0% > > 11/04/01 18:20:41 INFO mapred.JobClient: map 47% reduce 0% > > 11/04/01 18:20:42 INFO mapred.JobClient: map 50% reduce 0% > > 11/04/01 18:20:44 INFO mapred.JobClient: map 53% reduce 0% > > 11/04/01 18:20:45 INFO mapred.JobClient: map 55% reduce 0% > > 11/04/01 18:20:47 INFO mapred.JobClient: map 57% reduce 0% > > 11/04/01 18:20:48 INFO mapred.JobClient: map 60% reduce 0% > > 11/04/01 18:20:49 INFO mapred.JobClient: map 61% reduce 0% > > 11/04/01 18:20:50 INFO mapred.JobClient: map 63% reduce 0% > > 11/04/01 18:20:51 INFO mapred.JobClient: map 64% reduce 0% > > 11/04/01 18:20:52 INFO mapred.JobClient: map 66% reduce 0% > > 11/04/01 18:20:53 INFO mapred.JobClient: map 68% reduce 0% > > 11/04/01 18:20:54 INFO mapred.JobClient: map 69% reduce 0% > > 11/04/01 18:20:55 INFO mapred.JobClient: map 71% reduce 0% > > 11/04/01 18:20:56 INFO mapred.JobClient: map 72% reduce 0% > > 11/04/01 18:20:57 INFO mapred.JobClient: map 74% reduce 0% > > 11/04/01 18:20:58 INFO mapred.JobClient: map 75% reduce 0% > > 11/04/01 18:20:59 INFO mapred.JobClient: map 76% reduce 0% > > 11/04/01 18:21:00 INFO mapred.JobClient: map 78% reduce 0% > > 11/04/01 18:21:01 INFO mapred.JobClient: map 79% reduce 0% > > 11/04/01 18:21:03 INFO mapred.JobClient: map 81% reduce 0% > > 11/04/01 18:21:06 INFO mapred.JobClient: map 83% reduce 0% > > 11/04/01 18:21:07 INFO mapred.JobClient: map 85% reduce 0% > > 11/04/01 18:21:09 INFO mapred.JobClient: map 87% reduce 0% > > 11/04/01 18:21:10 INFO mapred.JobClient: map 88% reduce 0% > > 11/04/01 18:21:12 INFO mapred.JobClient: map 89% reduce 0% > > 11/04/01 18:21:13 INFO mapred.JobClient: map 90% reduce 0% > > 11/04/01 18:21:16 INFO mapred.JobClient: map 91% reduce 0% > > 11/04/01 18:21:17 INFO mapred.JobClient: map 92% reduce 0% > > 11/04/01 18:21:19 INFO mapred.JobClient: map 93% reduce 0% > > 11/04/01 18:21:25 INFO mapred.JobClient: map 94% reduce 0% > > 11/04/01 18:21:29 INFO mapred.JobClient: map 94% reduce 5% > > 11/04/01 18:21:35 INFO mapred.JobClient: map 95% reduce 6% > > 11/04/01 18:21:40 INFO mapred.JobClient: map 96% reduce 6% > > 11/04/01 18:21:43 INFO mapred.JobClient: map 97% reduce 6% > > 11/04/01 18:21:44 INFO mapred.JobClient: map 97% reduce 10% > > 11/04/01 18:21:48 INFO mapred.JobClient: map 98% reduce 10% > > 11/04/01 18:21:52 INFO mapred.JobClient: map 99% reduce 10% > > 11/04/01 18:21:57 INFO mapred.JobClient: map 100% reduce 10% > > 11/04/01 18:21:59 INFO mapred.JobClient: map 100% reduce 13% > > 11/04/01 18:22:05 INFO mapred.JobClient: map 100% reduce 18% > > 11/04/01 18:22:11 INFO mapred.JobClient: map 100% reduce 25% > > 11/04/01 18:23:23 INFO mapred.JobClient: map 100% reduce 26% > > 11/04/01 18:23:26 INFO mapred.JobClient: map 100% reduce 33% > > 11/04/01 18:25:27 INFO mapred.JobClient: map 100% reduce 74% > > 11/04/01 18:25:29 INFO mapred.JobClient: map 100% reduce 76% > > 11/04/01 18:25:32 INFO mapred.JobClient: map 100% reduce 78% > > 11/04/01 18:25:35 INFO mapred.JobClient: map 100% reduce 80% > > 11/04/01 18:25:38 INFO mapred.JobClient: map 100% reduce 82% > > 11/04/01 18:25:41 INFO mapred.JobClient: map 100% reduce 84% > > 11/04/01 18:25:44 INFO mapred.JobClient: map 100% reduce 86% > > 11/04/01 18:25:47 INFO mapred.JobClient: map 100% reduce 88% > > 11/04/01 18:25:50 INFO mapred.JobClient: map 100% reduce 90% > > 11/04/01 18:25:53 INFO mapred.JobClient: map 100% reduce 92% > > 11/04/01 18:25:56 INFO mapred.JobClient: map 100% reduce 94% > > 11/04/01 18:25:59 INFO mapred.JobClient: map 100% reduce 96% > > 11/04/01 18:26:02 INFO mapred.JobClient: map 100% reduce 98% > > 11/04/01 18:26:05 INFO mapred.JobClient: map 100% reduce 100% > > 11/04/01 18:26:06 INFO mapred.JobClient: Job complete: > job_201103311244_0045 > > 11/04/01 18:26:06 INFO mapred.JobClient: Counters: 23 > > 11/04/01 18:26:06 INFO mapred.JobClient: Job Counters > > 11/04/01 18:26:06 INFO mapred.JobClient: Launched reduce tasks=1 > > 11/04/01 18:26:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2045245 > > 11/04/01 18:26:06 INFO mapred.JobClient: Total time spent by all > reduces > waiting after reserving slots (ms)=0 > > 11/04/01 18:26:06 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > > 11/04/01 18:26:06 INFO mapred.JobClient: Rack-local map tasks=2 > > 11/04/01 18:26:06 INFO mapred.JobClient: Launched map tasks=23 > > 11/04/01 18:26:06 INFO mapred.JobClient: Data-local map tasks=21 > > 11/04/01 18:26:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=287005 > > 11/04/01 18:26:06 INFO mapred.JobClient: FileSystemCounters > > 11/04/01 18:26:06 INFO mapred.JobClient: FILE_BYTES_READ=1058213034 > > 11/04/01 18:26:06 INFO mapred.JobClient: HDFS_BYTES_READ=1565999068 > > 11/04/01 18:26:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1397240087 > > 11/04/01 18:26:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=638557669 > > 11/04/01 18:26:06 INFO mapred.JobClient: Map-Reduce Framework > > 11/04/01 18:26:07 INFO mapred.JobClient: Reduce input groups=24671345 > > 11/04/01 18:26:07 INFO mapred.JobClient: Combine output > records=275836241 > > 11/04/01 18:26:07 INFO mapred.JobClient: Map input records=3480795 > > 11/04/01 18:26:07 INFO mapred.JobClient: Reduce shuffle bytes=318629292 > > 11/04/01 18:26:07 INFO mapred.JobClient: Reduce output records=24671345 > > 11/04/01 18:26:07 INFO mapred.JobClient: Spilled Records=442523762 > > 11/04/01 18:26:07 INFO mapred.JobClient: Map output bytes=2873217818 > > 11/04/01 18:26:07 INFO mapred.JobClient: Combine input > records=373343059 > > 11/04/01 18:26:07 INFO mapred.JobClient: Map output records=164272900 > > 11/04/01 18:26:07 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860 > > 11/04/01 18:26:07 INFO mapred.JobClient: Reduce input records=66766082 > > 11/04/01 18:26:20 INFO pfpgrowth.PFPGrowth: No of Features: 103412 > > 11/04/01 18:26:20 INFO common.HadoopUtil: Deleting > hdfs://nnn001:54310/run/MR20/fim/fList > > 11/04/01 18:26:21 INFO common.HadoopUtil: Deleting > hdfs://nnn001:54310/run/MR20/fim/gList > > 11/04/01 18:26:21 INFO common.HadoopUtil: Deleting > /run/MR20/fim/sortedoutput > > 11/04/01 18:26:21 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > > 11/04/01 18:26:21 INFO input.FileInputFormat: Total input paths to process > : > 20 > > 11/04/01 18:26:21 INFO mapred.JobClient: Running job: job_201103311244_0046 > > 11/04/01 18:26:22 INFO mapred.JobClient: map 0% reduce 0% > > 11/04/01 18:26:33 INFO mapred.JobClient: map 27% reduce 0% > > 11/04/01 18:26:34 INFO mapred.JobClient: map 58% reduce 0% > > 11/04/01 18:26:35 INFO mapred.JobClient: map 63% reduce 0% > > 11/04/01 18:26:37 INFO mapred.JobClient: map 81% reduce 0% > > 11/04/01 18:26:39 INFO mapred.JobClient: map 86% reduce 0% > > 11/04/01 18:26:40 INFO mapred.JobClient: map 95% reduce 0% > > 11/04/01 18:26:44 INFO mapred.JobClient: map 95% reduce 21% > > 11/04/01 18:26:46 INFO mapred.JobClient: map 98% reduce 21% > > 11/04/01 18:26:47 INFO mapred.JobClient: map 98% reduce 31% > > 11/04/01 18:26:48 INFO mapred.JobClient: map 100% reduce 31% > > 11/04/01 18:26:53 INFO mapred.JobClient: map 100% reduce 33% > > 11/04/01 18:26:58 INFO mapred.JobClient: map 100% reduce 90% > > 11/04/01 18:27:01 INFO mapred.JobClient: map 100% reduce 100% > > 11/04/01 18:27:01 INFO mapred.JobClient: Job complete: > job_201103311244_0046 > > 11/04/01 18:27:01 INFO mapred.JobClient: Counters: 22 > > 11/04/01 18:27:01 INFO mapred.JobClient: Job Counters > > 11/04/01 18:27:01 INFO mapred.JobClient: Launched reduce tasks=1 > > 11/04/01 18:27:01 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=232517 > > 11/04/01 18:27:01 INFO mapred.JobClient: Total time spent by all > reduces > waiting after reserving slots (ms)=0 > > 11/04/01 18:27:01 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > > 11/04/01 18:27:01 INFO mapred.JobClient: Launched map tasks=20 > > 11/04/01 18:27:01 INFO mapred.JobClient: Data-local map tasks=20 > > 11/04/01 18:27:01 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24526 > > 11/04/01 18:27:01 INFO mapred.JobClient: FileSystemCounters > > 11/04/01 18:27:01 INFO mapred.JobClient: FILE_BYTES_READ=38487572 > > 11/04/01 18:27:01 INFO mapred.JobClient: HDFS_BYTES_READ=1618968708 > > 11/04/01 18:27:01 INFO mapred.JobClient: FILE_BYTES_WRITTEN=81300431 > > 11/04/01 18:27:01 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=84985526 > > 11/04/01 18:27:01 INFO mapred.JobClient: Map-Reduce Framework > > 11/04/01 18:27:01 INFO mapred.JobClient: Reduce input groups=48309 > > 11/04/01 18:27:01 INFO mapred.JobClient: Combine output records=0 > > 11/04/01 18:27:01 INFO mapred.JobClient: Map input records=3480795 > > 11/04/01 18:27:01 INFO mapred.JobClient: Reduce shuffle bytes=39454031 > > 11/04/01 18:27:01 INFO mapred.JobClient: Reduce output records=1539349 > > 11/04/01 18:27:01 INFO mapred.JobClient: Spilled Records=3078698 > > 11/04/01 18:27:01 INFO mapred.JobClient: Map output bytes=72092246 > > 11/04/01 18:27:01 INFO mapred.JobClient: Combine input records=0 > > 11/04/01 18:27:01 INFO mapred.JobClient: Map output records=1539349 > > 11/04/01 18:27:01 INFO mapred.JobClient: SPLIT_RAW_BYTES=2860 > > 11/04/01 18:27:01 INFO mapred.JobClient: Reduce input records=1539349 > > 11/04/01 18:27:01 INFO common.HadoopUtil: Deleting run/MR20/fim/fpgrowth > > 11/04/01 18:27:01 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > > 11/04/01 18:27:02 INFO input.FileInputFormat: Total input paths to process > : > 1 > > 11/04/01 18:27:02 INFO mapred.JobClient: Running job: job_201103311244_0047 > > 11/04/01 18:27:03 INFO mapred.JobClient: map 0% reduce 0% > > 11/04/01 18:27:14 INFO mapred.JobClient: map 39% reduce 0% > > 11/04/01 18:27:17 INFO mapred.JobClient: map 44% reduce 0% > > 11/04/01 18:28:15 INFO mapred.JobClient: map 64% reduce 0% > > 11/04/01 18:28:38 INFO mapred.JobClient: map 74% reduce 0% > > 11/04/01 18:28:41 INFO mapred.JobClient: map 78% reduce 0% > > 11/04/01 18:28:44 INFO mapred.JobClient: map 88% reduce 0% > > 11/04/01 18:28:47 INFO mapred.JobClient: map 95% reduce 0% > > 11/04/01 18:28:50 INFO mapred.JobClient: map 100% reduce 0% > > 11/04/01 18:32:27 INFO mapred.JobClient: map 100% reduce 33% > > 11/04/01 18:34:45 INFO mapred.JobClient: map 100% reduce 100% > > > ## AND IT's been here for all this while >
