I have a large Lucene index (with TermFreq vectors). I do not have easy
access to the original source docs that the index was made from. I have
identified a set of docs in the index as Category X. Is there a way to
run Mahout's Bayesian classification algorithm, trained on the docs in
Category X, on the remaining docs in the index to better indentify
category matches?

 

I have also exported the Lucene data into a Vector file in prep to run
some clustering experiments (as per the wiki examples) and also wondered
if that data could be used to feed the CBayes code. From what I can
tell, the classification code in Mahout takes a completely different
form of input compared to the clustering algorithms.

 

Thanks for any pointers.

 

 

David Croley

Lead Engineer

RenewData

512.351.0198 BlackBerry

512.276.5518 Desk 

[email protected] 

www.renewdata.com <http://www.renewdata.com/> 

 

Global in reach. Local in focus.

 



Confidentiality Notice: This electronic communication contained in this e-mail 
from [email protected] (including any attachments) may contain privileged 
and/or confidential information. This communication is intended only for the 
use of indicated e-mail addressees. Please be advised that any disclosure, 
dissemination, distribution, copying, or other use of this communication or any 
attached document other than for the purpose intended by the sender is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately by reply e-mail and promptly destroy all electronic and 
printed copies of this communication and any attached document. Thank you in 
advance for your cooperation.

Reply via email to