Did you try setting the emitMostLikely option of the DirichletDriver to true?
On Tue, Dec 18, 2012 at 2:38 AM, yoshihiro fujimoto < yoshihiro.0...@gmail.com> wrote: > Hi, > > I've used mahout at Dirichlet Process Clustering. > Input records is 37 ,but output records is 0. > In the case of 1800 records, output is normal(output records is 1800). > > What are your suggestions to solve this problem? > > == java code( the case of 37 records, and use mahout-core-0.7.jar, > mahout-math-0.7.jar) > > DirichletDriver.run(conf, > new Path("data/vector/vector.seq"), > new Path("data/dirichlet"), > new DistributionDescription(GaussianClusterDistribution.class.getName(), > RandomAccessSparseVector.class.getName(), > EuclideanDistanceMeasure.class.getName(), > 37), > 10, 2, 0.1, true, false, 0.1, false); > > == log ( the case of 37 records) > > > 2012/12/17 10:17:09 org.apache.hadoop.util.NativeCodeLoader#<clinit>:52 > WARN: Unable to load native-hadoop library for your platform... using > builtin-java classes where applicable > Cluster Iterator running iteration 1 over priorPath: > data/dirichlet/clusters-0 > 2012/12/17 10:17:10 > org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667 > WARN: Use GenericOptionsParser for parsing the arguments. Applications > should implement Tool for the same. > 2012/12/17 10:17:10 > org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237 > INFO: Total input paths to process : 1 > 2012/12/17 10:17:10 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288 > INFO: Running job: job_local_0001 > 2012/12/17 10:17:10 org.apache.hadoop.util.ProcessTree#isSetsidSupported:63 > INFO: setsid exited with exit code 0 > 2012/12/17 10:17:10 org.apache.hadoop.mapred.Task#initialize:534 > INFO: Using ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1f5b4d1 > 2012/12/17 10:17:10 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944 > INFO: io.sort.mb = 100 > 2012/12/17 10:17:10 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956 > INFO: data buffer = 79691776/99614720 > 2012/12/17 10:17:10 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957 > INFO: record buffer = 262144/327680 > 2012/12/17 10:17:11 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284 > INFO: Starting flush of map output > 2012/12/17 10:17:11 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466 > INFO: Finished spill 0 > 2012/12/17 10:17:11 org.apache.hadoop.mapred.Task#done:847 > INFO: Task:attempt_local_0001_m_000000_0 is done. And is in the process of > commiting > 2012/12/17 10:17:11 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301 > INFO: map 0% reduce 0% > 2012/12/17 10:17:13 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#sendDone:959 > INFO: Task 'attempt_local_0001_m_000000_0' done. > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#initialize:534 > INFO: Using ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3e926 > 2012/12/17 10:17:13 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390 > INFO: Merging 1 sorted segments > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473 > INFO: Down to the last merge-pass, with 1 segments left of total size: > 414560 bytes > 2012/12/17 10:17:13 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#done:847 > INFO: Task:attempt_local_0001_r_000000_0 is done. And is in the process of > commiting > 2012/12/17 10:17:13 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#commit:1000 > INFO: Task attempt_local_0001_r_000000_0 is allowed to commit now > 2012/12/17 10:17:13 > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173 > INFO: Saved output of task 'attempt_local_0001_r_000000_0' to > data/dirichlet/clusters-1 > 2012/12/17 10:17:14 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301 > INFO: map 100% reduce 0% > 2012/12/17 10:17:16 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: reduce > reduce > 2012/12/17 10:17:16 org.apache.hadoop.mapred.Task#sendDone:959 > INFO: Task 'attempt_local_0001_r_000000_0' done. > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301 > INFO: map 100% reduce 100% > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356 > INFO: Job complete: job_local_0001 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:585 > INFO: Counters: 20 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587 > INFO: File Output Format Counters > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Bytes Written=379153 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587 > INFO: FileSystemCounters > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: FILE_BYTES_READ=4679083 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: FILE_BYTES_WRITTEN=5169961 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587 > INFO: File Input Format Counters > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Bytes Read=29486 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587 > INFO: Map-Reduce Framework > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map output materialized bytes=414564 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map input records=37 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Reduce shuffle bytes=0 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Spilled Records=20 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map output bytes=414518 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Total committed heap usage (bytes)=358350848 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: CPU time spent (ms)=0 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: SPLIT_RAW_BYTES=121 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Combine input records=0 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Reduce input records=10 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Reduce input groups=10 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Combine output records=0 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Physical memory (bytes) snapshot=0 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Reduce output records=10 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Virtual memory (bytes) snapshot=0 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map output records=10 > Cluster Iterator running iteration 2 over priorPath: > data/dirichlet/clusters-1 > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667 > WARN: Use GenericOptionsParser for parsing the arguments. Applications > should implement Tool for the same. > 2012/12/17 10:17:17 > org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237 > INFO: Total input paths to process : 1 > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288 > INFO: Running job: job_local_0002 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#initialize:534 > INFO: Using ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@423d4f > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944 > INFO: io.sort.mb = 100 > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956 > INFO: data buffer = 79691776/99614720 > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957 > INFO: record buffer = 262144/327680 > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284 > INFO: Starting flush of map output > 2012/12/17 10:17:17 > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466 > INFO: Finished spill 0 > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#done:847 > INFO: Task:attempt_local_0002_m_000000_0 is done. And is in the process of > commiting > 2012/12/17 10:17:18 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301 > INFO: map 0% reduce 0% > 2012/12/17 10:17:20 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#sendDone:959 > INFO: Task 'attempt_local_0002_m_000000_0' done. > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#initialize:534 > INFO: Using ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1a32ea4 > 2012/12/17 10:17:20 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390 > INFO: Merging 1 sorted segments > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473 > INFO: Down to the last merge-pass, with 1 segments left of total size: > 402422 bytes > 2012/12/17 10:17:20 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#done:847 > INFO: Task:attempt_local_0002_r_000000_0 is done. And is in the process of > commiting > 2012/12/17 10:17:20 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#commit:1000 > INFO: Task attempt_local_0002_r_000000_0 is allowed to commit now > 2012/12/17 10:17:20 > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173 > INFO: Saved output of task 'attempt_local_0002_r_000000_0' to > data/dirichlet/clusters-2 > 2012/12/17 10:17:21 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301 > INFO: map 100% reduce 0% > 2012/12/17 10:17:23 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: reduce > reduce > 2012/12/17 10:17:23 org.apache.hadoop.mapred.Task#sendDone:959 > INFO: Task 'attempt_local_0002_r_000000_0' done. > 2012/12/17 10:17:24 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301 > INFO: map 100% reduce 100% > 2012/12/17 10:17:24 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356 > INFO: Job complete: job_local_0002 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:585 > INFO: Counters: 20 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587 > INFO: File Output Format Counters > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Bytes Written=379153 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587 > INFO: FileSystemCounters > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: FILE_BYTES_READ=10176320 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: FILE_BYTES_WRITTEN=9851171 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587 > INFO: File Input Format Counters > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Bytes Read=29486 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587 > INFO: Map-Reduce Framework > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map output materialized bytes=402426 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map input records=37 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Reduce shuffle bytes=0 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Spilled Records=20 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map output bytes=402380 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Total committed heap usage (bytes)=595066880 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: CPU time spent (ms)=0 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: SPLIT_RAW_BYTES=121 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Combine input records=0 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Reduce input records=10 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Reduce input groups=10 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Combine output records=0 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Physical memory (bytes) snapshot=0 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Reduce output records=10 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Virtual memory (bytes) snapshot=0 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map output records=10 > 2012/12/17 10:17:24 > org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667 > WARN: Use GenericOptionsParser for parsing the arguments. Applications > should implement Tool for the same. > 2012/12/17 10:17:24 > org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237 > INFO: Total input paths to process : 1 > 2012/12/17 10:17:24 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288 > INFO: Running job: job_local_0003 > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#initialize:534 > INFO: Using ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@14d581b > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#done:847 > INFO: Task:attempt_local_0003_m_000000_0 is done. And is in the process of > commiting > 2012/12/17 10:17:24 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#commit:1000 > INFO: Task attempt_local_0003_m_000000_0 is allowed to commit now > 2012/12/17 10:17:24 > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173 > INFO: Saved output of task 'attempt_local_0003_m_000000_0' to > data/dirichlet/clusteredPoints > 2012/12/17 10:17:25 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301 > INFO: map 0% reduce 0% > 2012/12/17 10:17:27 > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321 > INFO: > 2012/12/17 10:17:27 org.apache.hadoop.mapred.Task#sendDone:959 > INFO: Task 'attempt_local_0003_m_000000_0' done. > 2012/12/17 10:17:28 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301 > INFO: map 100% reduce 0% > 2012/12/17 10:17:28 > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356 > INFO: Job complete: job_local_0003 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:585 > INFO: Counters: 12 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587 > INFO: File Output Format Counters > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: Bytes Written=132 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587 > INFO: File Input Format Counters > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: Bytes Read=29486 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587 > INFO: FileSystemCounters > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: FILE_BYTES_READ=7433181 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: FILE_BYTES_WRITTEN=6674179 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587 > INFO: Map-Reduce Framework > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map input records=37 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: Physical memory (bytes) snapshot=0 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: Spilled Records=0 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: Total committed heap usage (bytes)=297533440 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: CPU time spent (ms)=0 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: Virtual memory (bytes) snapshot=0 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: SPLIT_RAW_BYTES=121 > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589 > INFO: Map output records=0 > > > Thanks > > Yoshihiro > -- Praneet Mhatre Graduate Student Donald Bren School of ICS University of California, Irvine