Hi

I am having difficulties linking my two machines into a hadoop cluster so I
am running mahout jobs in a single machine and I am running into
java.lang.OutOfMemoryError issues when the input files are big (see outputs
below, one is "Java heap space" and the other is "GC overhead limit
exceeded").

I don't have much experience with Java and I am wondering is there
something I can tweak within Mahout, Hadoop or Java to increase the amount
of memory accessible by these jobs?




./mahout-distribution-0.7/bin/mahout fpg -i ../data/input1.csv -o
../data/patterns -k 50 -method sequential -s 20
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop
MAHOUT-JOB: /root/src/mahout-distribution-0.7/mahout-examples-0.7-job.jar
13/02/15 20:46:25 INFO common.AbstractJob: Command line arguments:
{--encoding=[UTF-8], --endPhase=[2147483647], --input=[../data/input1.csv],
--maxHeapSize=[50], --method=[sequential], --minSupport=[20],
--numGroups=[1000], --numTreeCacheEntries=[5], --output=[../data/patterns],
--splitterPattern=[[ ,    ]*[,|  ][ ,     ]*], --startPhase=[0],
--tempDir=[temp]}
13/02/15 20:46:25 INFO pfpgrowth.FPGrowthDriver: Starting Sequential
FPGrowth
13/02/15 20:46:26 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
13/02/15 20:46:38 INFO fpgrowth.FPGrowth: Number of unique items 3603
13/02/15 20:46:38 INFO fpgrowth.FPGrowth: Number of unique pruned items 3603
13/02/15 20:46:38 INFO fpgrowth.FPGrowth: FPTree Building: Read 10000
Transactions
13/02/15 20:46:38 INFO fpgrowth.FPGrowth: FPTree Building: Read 20000
Transactions
13/02/15 20:46:38 INFO fpgrowth.FPGrowth: FPTree Building: Read 30000
Transactions
13/02/15 20:46:39 INFO fpgrowth.FPGrowth: FPTree Building: Read 40000
Transactions
13/02/15 20:46:39 INFO fpgrowth.FPGrowth: FPTree Building: Read 50000
Transactions
13/02/15 20:46:39 INFO fpgrowth.FPGrowth: FPTree Building: Read 60000
Transactions
13/02/15 20:46:39 INFO fpgrowth.FPGrowth: FPTree Building: Read 70000
Transactions
13/02/15 20:46:40 INFO fpgrowth.FPGrowth: FPTree Building: Read 80000
Transactions
13/02/15 20:46:40 INFO fpgrowth.FPGrowth: FPTree Building: Read 90000
Transactions
13/02/15 20:46:40 INFO fpgrowth.FPGrowth: FPTree Building: Read 100000
Transactions
13/02/15 20:46:40 INFO fpgrowth.FPGrowth: FPTree Building: Read 110000
Transactions
13/02/15 20:46:41 INFO fpgrowth.FPGrowth: FPTree Building: Read 120000
Transactions
13/02/15 20:46:41 INFO fpgrowth.FPGrowth: FPTree Building: Read 130000
Transactions
13/02/15 20:46:41 INFO fpgrowth.FPGrowth: FPTree Building: Read 140000
Transactions
13/02/15 20:46:42 INFO fpgrowth.FPGrowth: FPTree Building: Read 150000
Transactions
13/02/15 20:46:42 INFO fpgrowth.FPGrowth: FPTree Building: Read 160000
Transactions
13/02/15 20:46:43 INFO fpgrowth.FPGrowth: FPTree Building: Read 170000
Transactions
13/02/15 20:46:44 INFO fpgrowth.FPGrowth: FPTree Building: Read 180000
Transactions
13/02/15 20:46:45 INFO fpgrowth.FPGrowth: FPTree Building: Read 190000
Transactions
13/02/15 20:46:45 INFO fpgrowth.FPGrowth: FPTree Building: Read 200000
Transactions
13/02/15 20:46:46 INFO fpgrowth.FPGrowth: FPTree Building: Read 210000
Transactions
13/02/15 20:46:48 INFO fpgrowth.FPGrowth: FPTree Building: Read 220000
Transactions
13/02/15 20:46:49 INFO fpgrowth.FPGrowth: FPTree Building: Read 230000
Transactions
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPTree.resize(FPTree.java:357)
        at
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPTree.createNode(FPTree.java:189)
        at
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth.treeAddCount(FPGrowth.java:699)
        at
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth.generateTopKFrequentPatterns(FPGrowth.java:293)
        at
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth.generateTopKFrequentPatterns(FPGrowth.java:174)
        at
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver.runFPGrowth(FPGrowthDriver.java:183)
        at
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver.run(FPGrowthDriver.java:132)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver.main(FPGrowthDriver.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


 ./mahout-distribution-0.7/bin/mahout fpg -i ../data/input2.csv -o
../data/patterns -k 50 -method sequential -s 20 > tmp_ouput      13/02/15
20:55:19 INFO common.AbstractJob: Command line arguments:
{--encoding=[UTF-8], --endPhase=[2147483647], --input=[../data/input2.csv],
--maxHeapSize=[50], --method=[sequential], --minSupport=[20],
--numGroups=[1000], --numTreeCacheEntries=[5], --output=[../data/patterns],
--splitterPattern=[[ ,        ]*[,|  ][ ,     ]*], --startPhase=[0],
--tempDir=[temp]}
13/02/15 20:55:19 INFO pfpgrowth.FPGrowthDriver: Starting Sequential
FPGrowth
13/02/15 20:55:19 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded
        at
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth.generateFList(FPGrowth.java:87)
        at
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver.runFPGrowth(FPGrowthDriver.java:183)
        at
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver.run(FPGrowthDriver.java:132)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver.main(FPGrowthDriver.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Reply via email to