Well, my Mahout-0.8-SNAPSHOT is now fine with the analyzer option "org.apache.lucene.analysis.core.WhitespaceAnalyzer", but there are still some steps to get over with... This could be the Hadoop version incompatibility issue and if so, then what should be the right/minimum Hadoop version? (At least "ClusterDump" with Mahout-SNAPSHOT-0.8 worked fine against exisiting K-means result previously done in 0.7) I've been with Hadoop-0.20.203 (Pseudo-distributed) and Mahout-0.7 for sometime and have just recently upgraded Mahout side up to 0.8-SNAPSHOT.
$MAHOUT_HOME/bin/mahout seq2sparse --namedVector -i NHTSA-seqfile01/ -o NHTSA-namedVector -ow -a org.apache.lucene.analysis.core.WhitespaceAnalyzer -chunk 200 -wt tfidf -s 5 -md 3 -x 90 -ng 2 -ml 50 -seq -n 2 Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /usr/local/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar 13/05/12 01:45:48 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 2 13/05/12 01:45:48 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 50.0 13/05/12 01:45:48 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1 13/05/12 01:45:48 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1030) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) 13/05/12 01:45:48 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 13/05/12 01:45:48 WARN hdfs.DFSClient: Could not get block locations. Source file "/home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar" - Aborting... 13/05/12 01:45:48 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001 Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1030) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) 13/05/12 01:45:48 ERROR hdfs.DFSClient: Exception closing file /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1030) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) Sorry for the long error log. I believe my Hadoop-0.20.203 is up and running correctly... $JAVA_HOME/bin/jps 13322 TaskTracker 12985 DataNode 12890 NameNode 13937 Jps 13080 SecondaryNameNode 13219 JobTracker Hope someone could help this out. Regards,, Y.Mandai 2013/5/9 Yutaka Mandai <20525entrad...@gmail.com> > Suneel > Great to know. > Thanks! > Y.Mandai > > iPhoneから送信⌘ > > On 2013/05/07, at 22:24, Suneel Marthi <suneel_mar...@yahoo.com> wrote: > > > It should be > > org.apache.lucene.analysis.core.WhitespaceAnalyzer ( u were missing the > 'core') > > > > Mahout trunk's presently at Lucene 4.2.1. Lucene's has gone through a > major refactor in 4.x. > > Check Lucene 4.2.1 docs for the correct package name. > > > > > > > > > > ________________________________ > > From: 万代豊 <20525entrad...@gmail.com> > > To: "user@mahout.apache.org" <user@mahout.apache.org> > > Sent: Tuesday, May 7, 2013 3:20 AM > > Subject: Class Not Found from 0.8-SNAPSHOT for > org.apache.lucene.analysis.WhitespaceAnalyzer > > > > > > Hi all > > I guest I must've seen somewhere on very similar topics on classname > change > > in Mahout-0.8-SNAPSHOT for some of the Lucene analyzer and here is > another > > one that I need to be solved. > > Mahout gave me an error for seq2sparse with Lucene analyzer option as > > follows, > > which of cource had been working in at least Mahout 0.7. > > > > $MAHOUT_HOME/bin/mahout seq2sparse --namedVector -i NHTSA-seqfile01/ -o > > NHTSA-namedVector -ow -a org.apache.lucene.analysis.WhitespaceAnalyzer > > -chunk 200 -wt tfidf -s 5 -md 3 -x 90 -ng 2 -ml 50 -seq -n 2 > > Running on hadoop, using /usr/local/hadoop/bin/hadoop and > HADOOP_CONF_DIR= > > MAHOUT-JOB: > > /usr/local/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar > > 13/05/07 15:41:12 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum > > n-gram size is: 2 > > 13/05/07 15:41:18 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum > > LLR value: 50.0 > > 13/05/07 15:41:18 INFO vectorizer.SparseVectorsFromSequenceFiles: Number > of > > reduce tasks: 1 > > Exception in thread "main" java.lang.ClassNotFoundException: > > org.apache.lucene.analysis.WhitespaceAnalyzer > > I have confirmed what classpath Mahout is refering to as; > > $ $MAHOUT_HOME/bin/mahout classpath > > and obtained Lucene related classpath as below. > > > > > /usr/local/trunk/examples/target/dependency/lucene-analyzers-common-4.2.1.jar > > /usr/local/trunk/examples/target/dependency/lucene-benchmark-4.2.1.jar: > > /usr/local/trunk/examples/target/dependency/lucene-core-4.2.1.jar > > /usr/local/trunk/examples/target/dependency/lucene-facet-4.2.1.jar > > /usr/local/trunk/examples/target/dependency/lucene-highlighter-4.2.1.jar > > /usr/local/trunk/examples/target/dependency/lucene-memory-4.2.1.jar > > /usr/local/trunk/examples/target/dependency/lucene-queries-4.2.1.jar > > /usr/local/trunk/examples/target/dependency/lucene-queryparser-4.2.1.jar > > /usr/local/trunk/examples/target/dependency/lucene-sandbox-4.2.1.jar > > > > I want to believe this to be simple classname change related issue. > > Please let me be advised. > > Regards,,, > > Y.Mandai >