Dear Suresh,

I am also working in Classification of books.

First of all I collect a meta-data of my e-books, after collecting a
meta-data than I start my second level to pre-process an e-book. In
pre-processing, I collect information regarding *books title, chapter
titles sections, subsection paragraph, sub-paragraph and Bold fonts* etc.
and remove all other formatted style than i got a result.




On Thu, Jan 16, 2014 at 2:09 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> You generally want to do linguistic pre-processing (finding phrases,
> synonymizing certain forms such as abbreviations, tokenizing, dropping stop
> words, removing boilerplate, removing tables) before doing vectorization.
>  Altogether, these form pre-processing.
>
> To classify books, you need to recognize that many books are about many
> topics.  You may want to segment your books down to the chapter, section or
> even paragraph level.
>
>
>
> On Wed, Jan 15, 2014 at 10:25 PM, Suresh M <suresh4mas...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Can you please tell me what does that pre-processing mean? Is it
> > vectorization(as explained in Mahout in Action book)
> > Can it be done using java and Mahout AP ?
> > And, the model means, is it a class?
> >
> >
> >
> >
> > On 16 January 2014 11:38, KK R <kirubakumar...@gmail.com> wrote:
> >
> > > Hi Suresh,
> > >
> > > Apache Mahout has certain classification algorithms which you can use
> to
> > do
> > > the classifcation.
> > >
> > > Step 1: Your data may require any pre-processing. If so, it can be done
> > > using Hadoop / Hive / Mahout utilities.
> > >
> > > Step 2: Run classification algorithm on your training data and build
> your
> > > model using Mahout classification algorithms.
> > >
> > > Step 3: When the actual data comes, it needs to be classified with the
> > help
> > > of trained model. This can be done sequentially in java or mapreduce
> can
> > be
> > > used if the size of the data is huge and scalability is a requirement.
> > >
> > > Thanks,
> > > Kirubakumaresh
> > > @http://www.linkedin.com/pub/kirubakumaresh-rajendran/66/411/305
> > >
> > >
> > > On Thu, Jan 16, 2014 at 11:28 AM, Suresh M <suresh4mas...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > > Our application will be getting books from different users.
> > > > We have to classify them accordingly.
> > > > Some one please tell me how to do that using apache mahout and java.
> > > > Is hadoop necessary for that?
> > > >
> > > >
> > > > --
> > > > Thank &Regards
> > > > Suresh
> > > >
> > >
> >
>



-- 
*Saeed Iqbal KhattaK*
Lecturer (FoIT)  -- University of Central Punjab, Lahore
Tel: +92-42-35880007 - (ext 194)
MS CS, FAST-NUCES, Peshawar
BS IT (Hons), Punjab University College of Information Technology (PUCIT),
University Of The Punjab, Lahore.
http://saeedkhattak.wordpress.com
Cell No # +92-333-9533493

Reply via email to