Re: Latent Semantic Analysis for Document Categorization

David Starina Thu, 26 Mar 2015 06:06:04 -0700

Hi,

as Chirag said, try LDA. You can also check an implementation of pLSA, but
it is not part of Mahout, you can find it here:
https://github.com/akopich/dplsa


--David

On Thu, Mar 26, 2015 at 2:01 PM, 3316 Chirag Nagpal <
chiragnagpal_12...@aitpune.edu.in> wrote:

> A better approach I can think of for the aformentioned task is to use
> Latent Dirichlet Allocation
>
> You can force, LDA to learn topics with certain specific words by
> assigning higher probability values to those words in the initial dirichlet
> distribution.
>
> That way you will be able to discover topics better
>
> Chirag Nagpal
> Department of Computer Engineering
> Army Institute of Technology, Pune
>
> ________________________________________
> From: Hersheeta Chandankar <hersheetachandan...@gmail.com>
> Sent: Thursday, March 26, 2015 6:25 PM
> To: user@mahout.apache.org
> Subject: Latent Semantic Analysis for Document Categorization
>
> Hi,
>
> I'm working on a document categorization project wherein I have some
> crawled text documents on different topics which I want to categorize into
> pre-decided categories like travel,sports,education etc.
> Currently the approach I've used is of building a NaiveBayes Classification
> model in mahout which has given good accuracy result of 70%-75%. But I
> would still like to improve the accuracy by retrieving the semantic
> dependencies between words of the documents.
> I've read about Latent Semantic Analysis(LSA) which creates a term-document
> matrix and subjects it to mathematical transformation called Singular Value
> Decomposition(SVD).
> I'd thought of firstly subjecting the raw documents to LSA followed by
> k-means clustering on LSA output and then giving the clustered output as
> input to the NaiveBayes Classifier.
> But on trying out LSA in Mahout the end result seemed to be in numerical
> format and which after clustering were not acceptable by the NaiveBayes
> classifier.
>
> Is my expirimental approach wrong? Has anybody worked on a similar issue
> like this?
> Could someone help me with the implementation of LSA or suggest any other
> approach for semantic analysis of text documents.
>
> Thanks
> -Hersheeta
>

Re: Latent Semantic Analysis for Document Categorization

Reply via email to