Hi, I am now testing the trainclassifier and testclassifier commands in mabout, I prepaired a nbdf-train.csv and nbdf-test.csv file with the following R commands:
df <- read.arff(file = "d:/temp/kdd/KDDTrain+.arff") nbdf <- data.frame(class=df["class"], protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"]) nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]]) nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]]) nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]]) nbdf[["land"]] <- as.factor(nbdf[["land"]]) write.table(nbdf, file="D:/nbdf-train.csv", row.names=FALSE, col.names=FALSE, quote=FALSE, sep="\t") df <- read.arff(file = "d:/temp/kdd/KDDTest+.arff") nbdf <- data.frame(class=df["class"], protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"]) nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]]) nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]]) nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]]) nbdf[["land"]] <- as.factor(nbdf[["land"]]) write.table(nbdf, file="D:/nbdf-test.csv", row.names=FALSE, col.names=FALSE, quote=FALSE, sep="\t") and put them under nbtest/train and nbtest/test in HDFS then issue mahout trainclassifier --input nbtest/train --output nbtest/output mahout testclassifier --testDir nbtest/test --model nbtest/output trainclassifier seems succed, but testclassifier failed with this: [gpadmin@linuxsvr2 mahtest]$ mahout testclassifier --testDir nbtest/test --model nbtest/output Running on hadoop, using HADOOP_HOME=/usr/local/hadoop HADOOP_CONF_DIR=/usr/local/hadoop/conf MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.6-SNAPSHOT-job.jar 11/08/15 18:06:20 WARN driver.MahoutDriver: No testclassifier.props found on classpath, will use command-line arguments only 11/08/15 18:06:20 INFO bayes.TestClassifier: Loading model from: {basePath=nbtest/output, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, testDirPath=nbtest/test} 11/08/15 18:06:20 INFO bayes.TestClassifier: Testing Bayes Classifier 11/08/15 18:06:20 INFO bayes.SequenceFileModelReader: 77319.90481464032 11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: normal -213.05542661827678 442.8886516970405 -0.48105867197522617 11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: anomaly -442.8886516970405 442.8886516970405 -1.0 11/08/15 18:06:20 INFO bayes.TestClassifier: ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances : 0 锟 Incorrectly Classified Instances : 0 锟 Total Classified Instances : 0 ======================================================= Confusion Matrix ------------------------------------------------------- a b c <--Classified as 0 0 0 | 0 a = normal 0 0 0 | 0 b = anomaly 0 0 0 | 0 c = unknown Default Category: unknown: 2 11/08/15 18:06:20 INFO driver.MahoutDriver: Program took 746 ms