On Jan 16, 2014, at 1:58am, Suresh M <suresh4mas...@gmail.com> wrote:

> Hi,
> 
> Thanks for your reply.
> I have got the table of contents, meta-data, title, author, etc for the
> books.
> Can you please tell me the next steps to proceed.
> I have read in Mahout In Action book that there are few tools available for
> vectorization Ex: Lucene analyzers, Mahout vector encoders
> Can you please tell me which is good and how to use it.?

I cover some of the issues & approaches to generating text-based features in 
these two blog posts…

http://www.scaleunlimited.com/2013/07/10/text-feature-selection-for-machine-learning-part-1/

http://www.scaleunlimited.com/2013/07/21/text-feature-selection-for-machine-learning-part-2/

-- Ken



> On 16 January 2014 14:49, Saeed Iqbal KhattaK
> <saeediqbalkhat...@gmail.com>wrote:
> 
>> Dear Suresh,
>> 
>> I am also working in Classification of books.
>> 
>> First of all I collect a meta-data of my e-books, after collecting a
>> meta-data than I start my second level to pre-process an e-book. In
>> pre-processing, I collect information regarding *books title, chapter
>> titles sections, subsection paragraph, sub-paragraph and Bold fonts* etc.
>> and remove all other formatted style than i got a result.
>> 
>> 
>> 
>> 
>> On Thu, Jan 16, 2014 at 2:09 PM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>> 
>>> You generally want to do linguistic pre-processing (finding phrases,
>>> synonymizing certain forms such as abbreviations, tokenizing, dropping
>> stop
>>> words, removing boilerplate, removing tables) before doing vectorization.
>>> Altogether, these form pre-processing.
>>> 
>>> To classify books, you need to recognize that many books are about many
>>> topics.  You may want to segment your books down to the chapter, section
>> or
>>> even paragraph level.
>>> 
>>> 
>>> 
>>> On Wed, Jan 15, 2014 at 10:25 PM, Suresh M <suresh4mas...@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Can you please tell me what does that pre-processing mean? Is it
>>>> vectorization(as explained in Mahout in Action book)
>>>> Can it be done using java and Mahout AP ?
>>>> And, the model means, is it a class?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 16 January 2014 11:38, KK R <kirubakumar...@gmail.com> wrote:
>>>> 
>>>>> Hi Suresh,
>>>>> 
>>>>> Apache Mahout has certain classification algorithms which you can use
>>> to
>>>> do
>>>>> the classifcation.
>>>>> 
>>>>> Step 1: Your data may require any pre-processing. If so, it can be
>> done
>>>>> using Hadoop / Hive / Mahout utilities.
>>>>> 
>>>>> Step 2: Run classification algorithm on your training data and build
>>> your
>>>>> model using Mahout classification algorithms.
>>>>> 
>>>>> Step 3: When the actual data comes, it needs to be classified with
>> the
>>>> help
>>>>> of trained model. This can be done sequentially in java or mapreduce
>>> can
>>>> be
>>>>> used if the size of the data is huge and scalability is a
>> requirement.
>>>>> 
>>>>> Thanks,
>>>>> Kirubakumaresh
>>>>> @http://www.linkedin.com/pub/kirubakumaresh-rajendran/66/411/305
>>>>> 
>>>>> 
>>>>> On Thu, Jan 16, 2014 at 11:28 AM, Suresh M <suresh4mas...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> Our application will be getting books from different users.
>>>>>> We have to classify them accordingly.
>>>>>> Some one please tell me how to do that using apache mahout and
>> java.
>>>>>> Is hadoop necessary for that?
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Thank &Regards
>>>>>> Suresh
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> *Saeed Iqbal KhattaK*
>> Lecturer (FoIT)  -- University of Central Punjab, Lahore
>> Tel: +92-42-35880007 - (ext 194)
>> MS CS, FAST-NUCES, Peshawar
>> BS IT (Hons), Punjab University College of Information Technology (PUCIT),
>> University Of The Punjab, Lahore.
>> http://saeedkhattak.wordpress.com
>> Cell No # +92-333-9533493
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to