Did you try setting the emitMostLikely option of the DirichletDriver to
true?


On Tue, Dec 18, 2012 at 2:38 AM, yoshihiro fujimoto <
yoshihiro.0...@gmail.com> wrote:

> Hi,
>
> I've used mahout at Dirichlet Process Clustering.
> Input records is 37 ,but output records is 0.
> In the case of 1800 records, output is normal(output records is 1800).
>
> What are your suggestions to solve this problem?
>
> == java code( the case of 37 records, and use mahout-core-0.7.jar,
> mahout-math-0.7.jar)
>
> DirichletDriver.run(conf,
>  new Path("data/vector/vector.seq"),
> new Path("data/dirichlet"),
>  new DistributionDescription(GaussianClusterDistribution.class.getName(),
> RandomAccessSparseVector.class.getName(),
> EuclideanDistanceMeasure.class.getName(),
>  37),
> 10, 2, 0.1, true, false, 0.1, false);
>
> == log ( the case of 37 records)
>
>
> 2012/12/17 10:17:09 org.apache.hadoop.util.NativeCodeLoader#<clinit>:52
> WARN: Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> Cluster Iterator running iteration 1 over priorPath:
> data/dirichlet/clusters-0
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> WARN: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
> 2012/12/17 10:17:10
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> INFO: Total input paths to process : 1
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> INFO: Running job: job_local_0001
> 2012/12/17 10:17:10 org.apache.hadoop.util.ProcessTree#isSetsidSupported:63
> INFO: setsid exited with exit code 0
> 2012/12/17 10:17:10 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1f5b4d1
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944
> INFO: io.sort.mb = 100
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956
> INFO: data buffer = 79691776/99614720
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957
> INFO: record buffer = 262144/327680
> 2012/12/17 10:17:11
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284
> INFO: Starting flush of map output
> 2012/12/17 10:17:11
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466
> INFO: Finished spill 0
> 2012/12/17 10:17:11 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:11
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 0% reduce 0%
> 2012/12/17 10:17:13
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0001_m_000000_0' done.
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3e926
> 2012/12/17 10:17:13
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390
> INFO: Merging 1 sorted segments
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473
> INFO: Down to the last merge-pass, with 1 segments left of total size:
> 414560 bytes
> 2012/12/17 10:17:13
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:13
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#commit:1000
> INFO: Task attempt_local_0001_r_000000_0 is allowed to commit now
> 2012/12/17 10:17:13
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> INFO: Saved output of task 'attempt_local_0001_r_000000_0' to
> data/dirichlet/clusters-1
> 2012/12/17 10:17:14
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 0%
> 2012/12/17 10:17:16
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO: reduce > reduce
> 2012/12/17 10:17:16 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0001_r_000000_0' done.
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 100%
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> INFO: Job complete: job_local_0001
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:585
> INFO: Counters: 20
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Output Format Counters
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Written=379153
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> INFO:   FileSystemCounters
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_READ=4679083
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_WRITTEN=5169961
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Input Format Counters
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Read=29486
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> INFO:   Map-Reduce Framework
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output materialized bytes=414564
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map input records=37
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce shuffle bytes=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Spilled Records=20
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output bytes=414518
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Total committed heap usage (bytes)=358350848
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     CPU time spent (ms)=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     SPLIT_RAW_BYTES=121
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Combine input records=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce input records=10
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce input groups=10
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Combine output records=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Physical memory (bytes) snapshot=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce output records=10
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Virtual memory (bytes) snapshot=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output records=10
> Cluster Iterator running iteration 2 over priorPath:
> data/dirichlet/clusters-1
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> WARN: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
> 2012/12/17 10:17:17
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> INFO: Total input paths to process : 1
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> INFO: Running job: job_local_0002
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@423d4f
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944
> INFO: io.sort.mb = 100
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956
> INFO: data buffer = 79691776/99614720
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957
> INFO: record buffer = 262144/327680
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284
> INFO: Starting flush of map output
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466
> INFO: Finished spill 0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0002_m_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:18
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 0% reduce 0%
> 2012/12/17 10:17:20
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0002_m_000000_0' done.
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1a32ea4
> 2012/12/17 10:17:20
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390
> INFO: Merging 1 sorted segments
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473
> INFO: Down to the last merge-pass, with 1 segments left of total size:
> 402422 bytes
> 2012/12/17 10:17:20
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0002_r_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:20
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#commit:1000
> INFO: Task attempt_local_0002_r_000000_0 is allowed to commit now
> 2012/12/17 10:17:20
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> INFO: Saved output of task 'attempt_local_0002_r_000000_0' to
> data/dirichlet/clusters-2
> 2012/12/17 10:17:21
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 0%
> 2012/12/17 10:17:23
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO: reduce > reduce
> 2012/12/17 10:17:23 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0002_r_000000_0' done.
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 100%
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> INFO: Job complete: job_local_0002
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:585
> INFO: Counters: 20
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Output Format Counters
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Written=379153
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> INFO:   FileSystemCounters
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_READ=10176320
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_WRITTEN=9851171
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Input Format Counters
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Read=29486
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> INFO:   Map-Reduce Framework
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output materialized bytes=402426
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map input records=37
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce shuffle bytes=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Spilled Records=20
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output bytes=402380
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Total committed heap usage (bytes)=595066880
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     CPU time spent (ms)=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     SPLIT_RAW_BYTES=121
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Combine input records=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce input records=10
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce input groups=10
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Combine output records=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Physical memory (bytes) snapshot=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce output records=10
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Virtual memory (bytes) snapshot=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output records=10
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> WARN: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
> 2012/12/17 10:17:24
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> INFO: Total input paths to process : 1
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> INFO: Running job: job_local_0003
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@14d581b
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0003_m_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#commit:1000
> INFO: Task attempt_local_0003_m_000000_0 is allowed to commit now
> 2012/12/17 10:17:24
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> INFO: Saved output of task 'attempt_local_0003_m_000000_0' to
> data/dirichlet/clusteredPoints
> 2012/12/17 10:17:25
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 0% reduce 0%
> 2012/12/17 10:17:27
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:27 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0003_m_000000_0' done.
> 2012/12/17 10:17:28
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 0%
> 2012/12/17 10:17:28
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> INFO: Job complete: job_local_0003
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:585
> INFO: Counters: 12
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Output Format Counters
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Written=132
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Input Format Counters
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Read=29486
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> INFO:   FileSystemCounters
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_READ=7433181
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_WRITTEN=6674179
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> INFO:   Map-Reduce Framework
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map input records=37
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Physical memory (bytes) snapshot=0
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Spilled Records=0
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Total committed heap usage (bytes)=297533440
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     CPU time spent (ms)=0
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Virtual memory (bytes) snapshot=0
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     SPLIT_RAW_BYTES=121
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output records=0
>
>
> Thanks
>
> Yoshihiro
>



-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine

Reply via email to