Hi

I am trying to use SimpleNaiveBayesClassifier in my solr project. Currently
looking at its test base ClassificationTestBase.java.

The sample test code inside seems like that classifier read the whole index
db to train the model everytime when classification happened for
inputDocument. or am I misunderstanding something here? If i had a large
index db, will it impact performance?

protected void checkCorrectClassification(Classifier<T> classifier, String
inputDoc, T expectedResult, Analyzer analyzer, String textFieldName, String
classFieldName, Query query) throws Exception {

    AtomicReader atomicReader = null;

    try {

      populateSampleIndex(analyzer);

      atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter
.getReader());

      classifier.train(atomicReader, textFieldName, classFieldName, analyzer,
query);

      ClassificationResult<T> classificationResult = classifier.assignClass(
inputDoc);

      assertNotNull(classificationResult.getAssignedClass());

      assertEquals("got an assigned class of " +
classificationResult.getAssignedClass(),
expectedResult, classificationResult.getAssignedClass());

      assertTrue("got a not positive score " + classificationResult.getScore(),
classificationResult.getScore() > 0);

    } finally {

      if (atomicReader != null)

        atomicReader.close();

    }

  }

Reply via email to