> Which version of Mahout, what platform and which components are a good
start.

Mahout version : 0.4
Platform : linux ubuntu 10.10 , Maven , Tomcat
Components :
 <dependency>
          <groupId>org.apache.mahout</groupId>
          <artifactId>mahout-core</artifactId>
          <version>${apache.mahout.version}</version>
        </dependency>

        <dependency>
          <groupId>org.apache.mahout</groupId>
          <artifactId>mahout-math</artifactId>
          <version>${apache.mahout.version}</version>
        </dependency>

        <dependency>
          <groupId>org.apache.mahout</groupId>
          <artifactId>mahout-utils</artifactId>
          <version>${apache.mahout.version}</version>
        </dependency>

My scenario is :
assume  that  I have some sport documents , I made a class to train Mahout
by NaiveBayes algorithm to make a category for this documents
Locale en = new Locale("en");
        final BayesParameters params = new BayesParameters();
        params.setGramSize(1);
        params.set("verbose", "true");
        params.set("classifierType", "bayes");
        params.set("defaultCat", "OTHER");
        params.set("encoding", "UTF-8");
        params.set("alpha_i", "1.0");
        params.set("dataSource", "hdfs");
        params.set("basePath",
messageSource.getMessage(WebConstants.MAHOUT_BASE_PATH, null, en));

try {

            Path input = new
Path(messageSource.getMessage(WebConstants.MAHOUT_INPUT_PATH, null, en));
            Path output = new
Path(messageSource.getMessage(WebConstants.MAHOUT_OUTPUT_PATH, null, en));


            TrainClassifier.trainNaiveBayes(input, output, params);
            System.out.println("Training Finished");

        } catch (final IOException ex) {
            ex.printStackTrace();
        }


and after training finished , I made a class to detect a document (which i
know of course its a sport document) , and then i gave this document to this
class like :

        params = new BayesParameters(2);
        params.set("verbose", "false");
        params.set("basePath",
                messageSource.getMessage(WebConstants.MAHOUT_OUTPUT_PATH,
null, null));
        // Interchange the values for swap between byaes and cbayes
classifier
        params.set("classifierType", "bayes");
        params.set("dataSource", "hdfs");
        params.set("defaultCat", "OTHER");
        params.set("encoding", "UTF-8");
        params.set("alpha_i", "1.0");
        params.set("basePath", output);
        Algorithm algorithm = new BayesAlgorithm();// Creating the instance
of
        Datastore datastore = new InMemoryBayesDatastore(params);// Creating
        ClassifierContext classifier = new ClassifierContext(algorithm,
                datastore);
        classifier.initialize();

// fileContent is the giving string to be detect
        List document = new NGrams(fileContent, Integer.parseInt(params
                .get("gramSize"))).generateNGramsWithoutLabel();

        ClassifierResult result = classifier.classifyDocument(
                (String[]) document.toArray(new String[document.size()]),
                params.get("defaultCat"));

        String label = result.getLabel();

        log.info("Got classification, label: '{}', score:'{}'", label,
                result.getScore());

        System.out.println("label "+label+" score "+result.getScore());

when I got the output it was like label = sport & score = 0.75521

and when i gave it more string it directly  proportional like 0.9527
and sometimes like 1.5234 , and so on
my question is* what does the score represent* ?? if it is a kind of
probability so it has to be from 0 to 1 or in percentage 0 to 100

Thanks in advance

Reply via email to