I have a large Lucene index (with TermFreq vectors). I do not have easy access to the original source docs that the index was made from. I have identified a set of docs in the index as Category X. Is there a way to run Mahout's Bayesian classification algorithm, trained on the docs in Category X, on the remaining docs in the index to better indentify category matches?
I have also exported the Lucene data into a Vector file in prep to run some clustering experiments (as per the wiki examples) and also wondered if that data could be used to feed the CBayes code. From what I can tell, the classification code in Mahout takes a completely different form of input compared to the clustering algorithms. Thanks for any pointers. David Croley Lead Engineer RenewData 512.351.0198 BlackBerry 512.276.5518 Desk [email protected] www.renewdata.com <http://www.renewdata.com/> Global in reach. Local in focus. Confidentiality Notice: This electronic communication contained in this e-mail from [email protected] (including any attachments) may contain privileged and/or confidential information. This communication is intended only for the use of indicated e-mail addressees. Please be advised that any disclosure, dissemination, distribution, copying, or other use of this communication or any attached document other than for the purpose intended by the sender is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attached document. Thank you in advance for your cooperation.
