I run the same command of you, and get this error. Don't know why.
10/08/06 11:46:05 ERROR driver.MahoutDriver: MahoutDriver failed with args:
[fpg, -i, accidents, -o, pattern, -k, 50, -method, mapreduce, -g, 20,
-regex, [ ], -s, 2]
null
Exception in thread "main" java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:418)
at java.util.Properties.load0(Properties.java:337)
at java.util.Properties.load(Properties.java:325)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
-----邮件原件-----
发件人: Ankur C. Goel [mailto:[email protected]]
发送时间: 2010年8月4日 18:02
收件人: [email protected]
主题: Re: Error: Java heap space when running FPGrowth
Hi tanweiguo,
Which version of hadoop are you using ? I ran the example on
hadoop 0.20.2 release on a single node cluster using the mahout binary
$MAHOUT_INSTALL_DIR/bin/mahout fpg -i accidents -o pattern -k 50 -method
mapreduce -g 20 -regex [\ ] -s 2 and it worked for me.
In my single node setup, mapred.child.java.opts="-server -Xmx768m
-Djava.net.preferIPv4Stack=true"
Not sure if there is a way exposed to control the parallelism. Robin ?
-...@nkur
On 8/4/10 1:18 PM, "tanweiguo" <[email protected]> wrote:
I just followed the wiki to test FPGrowth:
https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html
1.unzip and put the accidents.dat.gz to HDFS accidents folder 2.run on a
hadoop cluster(1 master and 3 slaves)
hadoop jar mahout-examples-0.3.job
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver \
-i accidents \
-o patterns \
-k 50 \
-method mapreduce \
-g 10 \
-regex [\ ] \
-s 2
The first two MapReduce(Parallel Counting Driver running over input:
accidents; PFP Transaction Sorting running over inputaccidents) succeed.
However, the third MapReduce(PFP Growth Driver running over
inputpatterns/sortedoutput) always fail with this error message:
10/08/04 15:23:45 INFO input.FileInputFormat: Total input paths to
process : 1
10/08/04 15:23:46 INFO mapred.JobClient: Running job:
job_201007271506_0025
10/08/04 15:23:47 INFO mapred.JobClient: map 0% reduce 0%
10/08/04 15:24:05 INFO mapred.JobClient: map 13% reduce 0%
10/08/04 15:24:08 INFO mapred.JobClient: map 22% reduce 0%
10/08/04 15:24:11 INFO mapred.JobClient: map 24% reduce 0%
10/08/04 15:24:29 INFO mapred.JobClient: map 0% reduce 0%
10/08/04 15:24:31 INFO mapred.JobClient: Task Id :
attempt_201007271506_0025_m_000000_0, Status : FAILED
Error: java.lang.OutOfMemoryError: Java heap space
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.resize(TransactionTree.java:
446)
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.createNode(TransactionTree.j
ava:409)
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.addPattern(TransactionTree.j
ava:202)
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.getCompressedTree(Transactio
nTree.java:285)
at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGr
owthCombiner.java:51)
at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGr
owthCombiner.java:33)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1214)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
227)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:64
8)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
a:1135)
The parameter mapred.child.java.opts is set to -Xmx512m in my cluster.
I also tried -g 5 and -g 20, both failed with the same error message.
Another question: I find there is only one mapper. How to adjust parameter
to have more mappers to improve speed?