> Which version of Mahout, what platform and which components are a good
start.
Mahout version : 0.4
Platform : linux ubuntu 10.10 , Maven , Tomcat
Components :
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>${apache.mahout.version}</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>${apache.mahout.version}</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-utils</artifactId>
<version>${apache.mahout.version}</version>
</dependency>
My scenario is :
assume that I have some sport documents , I made a class to train Mahout
by NaiveBayes algorithm to make a category for this documents
Locale en = new Locale("en");
final BayesParameters params = new BayesParameters();
params.setGramSize(1);
params.set("verbose", "true");
params.set("classifierType", "bayes");
params.set("defaultCat", "OTHER");
params.set("encoding", "UTF-8");
params.set("alpha_i", "1.0");
params.set("dataSource", "hdfs");
params.set("basePath",
messageSource.getMessage(WebConstants.MAHOUT_BASE_PATH, null, en));
try {
Path input = new
Path(messageSource.getMessage(WebConstants.MAHOUT_INPUT_PATH, null, en));
Path output = new
Path(messageSource.getMessage(WebConstants.MAHOUT_OUTPUT_PATH, null, en));
TrainClassifier.trainNaiveBayes(input, output, params);
System.out.println("Training Finished");
} catch (final IOException ex) {
ex.printStackTrace();
}
and after training finished , I made a class to detect a document (which i
know of course its a sport document) , and then i gave this document to this
class like :
params = new BayesParameters(2);
params.set("verbose", "false");
params.set("basePath",
messageSource.getMessage(WebConstants.MAHOUT_OUTPUT_PATH,
null, null));
// Interchange the values for swap between byaes and cbayes
classifier
params.set("classifierType", "bayes");
params.set("dataSource", "hdfs");
params.set("defaultCat", "OTHER");
params.set("encoding", "UTF-8");
params.set("alpha_i", "1.0");
params.set("basePath", output);
Algorithm algorithm = new BayesAlgorithm();// Creating the instance
of
Datastore datastore = new InMemoryBayesDatastore(params);// Creating
ClassifierContext classifier = new ClassifierContext(algorithm,
datastore);
classifier.initialize();
// fileContent is the giving string to be detect
List document = new NGrams(fileContent, Integer.parseInt(params
.get("gramSize"))).generateNGramsWithoutLabel();
ClassifierResult result = classifier.classifyDocument(
(String[]) document.toArray(new String[document.size()]),
params.get("defaultCat"));
String label = result.getLabel();
log.info("Got classification, label: '{}', score:'{}'", label,
result.getScore());
System.out.println("label "+label+" score "+result.getScore());
when I got the output it was like label = sport & score = 0.75521
and when i gave it more string it directly proportional like 0.9527
and sometimes like 1.5234 , and so on
my question is* what does the score represent* ?? if it is a kind of
probability so it has to be from 0 to 1 or in percentage 0 to 100
Thanks in advance