Hi,

I have a collection of crawled text documents on different topics which I
want to categorize into pre-decided categories like travel,sports,education
etc.
For this I've firstly clustered these documents using k-means clustering
and then built a complimentary-naive bayes model of these clustered
documents.
The accuracy and reliability of the model was 83% & 63% respectively.
Now the problem is that, on deploying the model the results recorded are
absurd
(eg- A sports document is categorized under business category).
On analyzing the problem, I found that the clusters formed were not clean
(contained unrelated documents) which may have led to creation of wrong
dictionary file.

In order to avoid this, is there any other way to get the input data
preprocessed and clustered ?
or
Is there any other alternative approach that could be used for the
categorization?

Thanks,
-Hersheeta

Reply via email to