Dear Suresh, I am also working in Classification of books.
First of all I collect a meta-data of my e-books, after collecting a meta-data than I start my second level to pre-process an e-book. In pre-processing, I collect information regarding *books title, chapter titles sections, subsection paragraph, sub-paragraph and Bold fonts* etc. and remove all other formatted style than i got a result. On Thu, Jan 16, 2014 at 2:09 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > You generally want to do linguistic pre-processing (finding phrases, > synonymizing certain forms such as abbreviations, tokenizing, dropping stop > words, removing boilerplate, removing tables) before doing vectorization. > Altogether, these form pre-processing. > > To classify books, you need to recognize that many books are about many > topics. You may want to segment your books down to the chapter, section or > even paragraph level. > > > > On Wed, Jan 15, 2014 at 10:25 PM, Suresh M <suresh4mas...@gmail.com> > wrote: > > > Hi, > > > > Can you please tell me what does that pre-processing mean? Is it > > vectorization(as explained in Mahout in Action book) > > Can it be done using java and Mahout AP ? > > And, the model means, is it a class? > > > > > > > > > > On 16 January 2014 11:38, KK R <kirubakumar...@gmail.com> wrote: > > > > > Hi Suresh, > > > > > > Apache Mahout has certain classification algorithms which you can use > to > > do > > > the classifcation. > > > > > > Step 1: Your data may require any pre-processing. If so, it can be done > > > using Hadoop / Hive / Mahout utilities. > > > > > > Step 2: Run classification algorithm on your training data and build > your > > > model using Mahout classification algorithms. > > > > > > Step 3: When the actual data comes, it needs to be classified with the > > help > > > of trained model. This can be done sequentially in java or mapreduce > can > > be > > > used if the size of the data is huge and scalability is a requirement. > > > > > > Thanks, > > > Kirubakumaresh > > > @http://www.linkedin.com/pub/kirubakumaresh-rajendran/66/411/305 > > > > > > > > > On Thu, Jan 16, 2014 at 11:28 AM, Suresh M <suresh4mas...@gmail.com> > > > wrote: > > > > > > > Hi, > > > > Our application will be getting books from different users. > > > > We have to classify them accordingly. > > > > Some one please tell me how to do that using apache mahout and java. > > > > Is hadoop necessary for that? > > > > > > > > > > > > -- > > > > Thank &Regards > > > > Suresh > > > > > > > > > > -- *Saeed Iqbal KhattaK* Lecturer (FoIT) -- University of Central Punjab, Lahore Tel: +92-42-35880007 - (ext 194) MS CS, FAST-NUCES, Peshawar BS IT (Hons), Punjab University College of Information Technology (PUCIT), University Of The Punjab, Lahore. http://saeedkhattak.wordpress.com Cell No # +92-333-9533493