You generally want to do linguistic pre-processing (finding phrases,
synonymizing certain forms such as abbreviations, tokenizing, dropping stop
words, removing boilerplate, removing tables) before doing vectorization.
 Altogether, these form pre-processing.

To classify books, you need to recognize that many books are about many
topics.  You may want to segment your books down to the chapter, section or
even paragraph level.



On Wed, Jan 15, 2014 at 10:25 PM, Suresh M <suresh4mas...@gmail.com> wrote:

> Hi,
>
> Can you please tell me what does that pre-processing mean? Is it
> vectorization(as explained in Mahout in Action book)
> Can it be done using java and Mahout AP ?
> And, the model means, is it a class?
>
>
>
>
> On 16 January 2014 11:38, KK R <kirubakumar...@gmail.com> wrote:
>
> > Hi Suresh,
> >
> > Apache Mahout has certain classification algorithms which you can use to
> do
> > the classifcation.
> >
> > Step 1: Your data may require any pre-processing. If so, it can be done
> > using Hadoop / Hive / Mahout utilities.
> >
> > Step 2: Run classification algorithm on your training data and build your
> > model using Mahout classification algorithms.
> >
> > Step 3: When the actual data comes, it needs to be classified with the
> help
> > of trained model. This can be done sequentially in java or mapreduce can
> be
> > used if the size of the data is huge and scalability is a requirement.
> >
> > Thanks,
> > Kirubakumaresh
> > @http://www.linkedin.com/pub/kirubakumaresh-rajendran/66/411/305
> >
> >
> > On Thu, Jan 16, 2014 at 11:28 AM, Suresh M <suresh4mas...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > Our application will be getting books from different users.
> > > We have to classify them accordingly.
> > > Some one please tell me how to do that using apache mahout and java.
> > > Is hadoop necessary for that?
> > >
> > >
> > > --
> > > Thank &Regards
> > > Suresh
> > >
> >
>

Reply via email to