Re: [Hadoop-Help]About Map-Reduce implementation

Mayur Patil Thu, 14 Feb 2013 01:39:52 -0800

Hello,

   I just read about Pig


 >  Pig
 >  A data flow language and execution environment for exploring very
large datasets.
 >  Pig runs on HDFS and MapReduce clusters.

   What the actual difference between Pig and Flume makes in logs clustering??

   Thank you !!
--
Cheers,
Mayur.




> Thanks to you duo. You solved my problem so easily. I want to
>
> ask one more question; for reference. I have
>
> 1. hadoop the definitive guide
> 2. Hadoop In Action
>
> Is it sufficient or do I need some more material to study
>
> your suggested implementation??
> *
> --
> Cheers,
> Mayur*
>
> Hey Mayur,
>>
>> If you are collecting logs from multiple servers then you can use flume
>> for the same.
>>
>> if the contents of the logs are different in format  then you can just
>> use
>> textfileinput format to read and write into any other format you want for
>> your processing in later part of your projects
>>
>> first thing you need to learn is how to setup hadoop
>> then you can try writing sample hadoop mapreduce jobs to read from text
>> file and then process them and write the results into another file
>> then you can integrate flume as your log collection mechanism
>> once you get hold on the system then you can decide more on which paths
>> you want to follow based on your requirements for storage, compute time,
>> compute capacity, compression etc
>>
> --------------
> --------------
>
>> Hi,
>>
>> Please read basics on how hadoop works.
>>
>> Then start your hands on with map reduce coding.
>>
>> The tool which has been made for you is flume , but don't see tool till
>> you complete above two steps.
>>
>> Good luck , keep us posted.
>>
>> Regards,
>>
>> Jagat Singh
>>
>> -----------
>> Sent from Mobile , short and crisp.
>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[email protected]> wrote:
>>
>>> Hello,
>>>
>>>     I am new to Hadoop. I am doing a project in cloud in which I
>>>
>>>     have to use hadoop for Map-reduce. It is such that I am going
>>>
>>>     to collect logs from 2-3 machines having different locations.
>>>
>>>     The logs are also in different formats such as .rtf .log .txt
>>>
>>>     Later, I have to collect and convert them to one format and
>>>
>>>     collect to one location.
>>>
>>>     So I am asking which module of Hadoop that I need to study
>>>
>>>     for this implementation?? Or whole framework should I need
>>>
>>>     to study ??
>>>
>>>     Seeking for guidance,
>>>
>>>     Thank you !!
>>> --
>>> *Cheers,*
>>> *Mayur.*
>>>
>>
>


-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Reply via email to