Hi,
I'm working on a document categorization project wherein I have some
crawled text documents on different topics which I want to categorize into
pre-decided categories like travel,sports,education etc.
Currently the approach I've used is of building a NaiveBayes Classification
model in mahout
Thank you so much Chirag and David for your suggestion.
I'll surely try it.
On Thu, Mar 26, 2015 at 6:31 PM, 3316 Chirag Nagpal
chiragnagpal_12...@aitpune.edu.in wrote:
A better approach I can think of for the aformentioned task is to use
Latent Dirichlet Allocation
You can force, LDA to
A better approach I can think of for the aformentioned task is to use Latent
Dirichlet Allocation
You can force, LDA to learn topics with certain specific words by assigning
higher probability values to those words in the initial dirichlet distribution.
That way you will be able to discover
Hi,
as Chirag said, try LDA. You can also check an implementation of pLSA, but
it is not part of Mahout, you can find it here:
https://github.com/akopich/dplsa
--David
On Thu, Mar 26, 2015 at 2:01 PM, 3316 Chirag Nagpal
chiragnagpal_12...@aitpune.edu.in wrote:
A better approach I can think
Hmm I just ran into this, thanks for the research.
This may cause problems on cluster machines unless it is Mac specific so
putting into /usr/lib/java may need to be on all nodes. Not sure that is the
best solution. Let me know if you run into this on a ’nix type cluster.
On Mar 19, 2015, at
Finally getting to Yarn. Paul were you trying to run spark-itemsimilarity with
the spark-submit script? That shouldn’t work, the job is a standalone app and
does not require, nor is it likely to work with spark-submit.
Were you able to run on Yarn? How?
On Jan 29, 2015, at 9:15 AM, Pat Ferrel
Raghuveer,
I am more confused than before.
You say that the destination is on the second line. That seems to imply
that your data has more than one line per data point. Is this so? That
seems to contradict your previous comments.
On Wed, Mar 25, 2015 at 10:20 PM, Raghuveer
Also, if you can include linking information between documents, you should
be able to substantially improve accuracy. Same goes for behavioral data
like browsing history.
On Thu, Mar 26, 2015 at 6:10 AM, Hersheeta Chandankar
hersheetachandan...@gmail.com wrote:
Thank you so much Chirag and