I am trying to classify files that have CSV data or LOG file data or english 
TXT. A different set of operations will be executed on esch type of file. The 
training file is of the form with about 20 lines for each type (do I need more 
?):

The model is trained with model = DocumentCategorizerME.train("en", 
sampleStream,5, 100);

CSV 546894493,John Smith,5354188684365040,(432) 
209-8058,[email protected],(341) 611-2944,18970 Avonaco 
Ln,Brandsville,MA,92145
TXT In the following simple sentences, subjects are in yellow, and verbs are in 
green.  
LOG 127.0.0.1 - - [10/Apr/2007:10:54:21 +0300] "GET /unix_sysadmin.html 
HTTP/1.1" 200 3880 "http://pti.local/"; "Mozilla/5.0 (X11; U; Linux i686; en-US; 
rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"

After creating a model, when I try to test and classify, 
evaluator.evaluteSample(sample)  for a CSV input returns accuracy is 1.0 for 
CSV which is correct, but for the same sample  myCategorizer.categorize(sample) 
 from myCategorizer.categorize returns LOG.  

Please suggest how this can be fixed.

thanks,
Ravi.
                                          

Reply via email to