Hi I am trying to use SimpleNaiveBayesClassifier in my solr project. Currently looking at its test base ClassificationTestBase.java.
The sample test code inside seems like that classifier read the whole index db to train the model everytime when classification happened for inputDocument. or am I misunderstanding something here? If i had a large index db, will it impact performance? protected void checkCorrectClassification(Classifier<T> classifier, String inputDoc, T expectedResult, Analyzer analyzer, String textFieldName, String classFieldName, Query query) throws Exception { AtomicReader atomicReader = null; try { populateSampleIndex(analyzer); atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter .getReader()); classifier.train(atomicReader, textFieldName, classFieldName, analyzer, query); ClassificationResult<T> classificationResult = classifier.assignClass( inputDoc); assertNotNull(classificationResult.getAssignedClass()); assertEquals("got an assigned class of " + classificationResult.getAssignedClass(), expectedResult, classificationResult.getAssignedClass()); assertTrue("got a not positive score " + classificationResult.getScore(), classificationResult.getScore() > 0); } finally { if (atomicReader != null) atomicReader.close(); } }