Hello,

   Now I am slowly understanding Hadoop working.

  As I want to collect the logs from three machines

  including Master itself . My small query is

  which mode should I implement for this??

                  Standalone Operation
                  Pseudo-Distributed Operation
                  Fully-Distributed Operation

     Seeking for guidance,

     Thank you !!
*--
Cheers,
Mayur*




Hi mayur,
>>
>> Flume is used for data collection. Pig is used for data processing.
>> For eg, if you have a bunch of servers that you want to collect the
>> logs from and push to HDFS - you would use flume. Now if you need to
>> run some analysis on that data, you could use pig to do that.
>>
>> Sent from my iPhone
>>
>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ram.nath241...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> >   I just read about Pig
>> >
>> >> Pig
>> >> A data flow language and execution environment for exploring very
>> > large datasets.
>> >> Pig runs on HDFS and MapReduce clusters.
>> >
>> >   What the actual difference between Pig and Flume makes in logs
>> clustering??
>> >
>> >   Thank you !!
>> > --
>> > Cheers,
>> > Mayur.
>> >
>> >
>> >
>> >> Hey Mayur,
>> >>>
>> >>> If you are collecting logs from multiple servers then you can use
>> flume
>> >>> for the same.
>> >>>
>> >>> if the contents of the logs are different in format  then you can just
>> >>> use
>> >>> textfileinput format to read and write into any other format you want
>> for
>> >>> your processing in later part of your projects
>> >>>
>> >>> first thing you need to learn is how to setup hadoop
>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> text
>> >>> file and then process them and write the results into another file
>> >>> then you can integrate flume as your log collection mechanism
>> >>> once you get hold on the system then you can decide more on which
>> paths
>> >>> you want to follow based on your requirements for storage, compute
>> time,
>> >>> compute capacity, compression etc
>> >>>
>> >> --------------
>> >> --------------
>> >>
>> >>> Hi,
>> >>>
>> >>> Please read basics on how hadoop works.
>> >>>
>> >>> Then start your hands on with map reduce coding.
>> >>>
>> >>> The tool which has been made for you is flume , but don't see tool
>> till
>> >>> you complete above two steps.
>> >>>
>> >>> Good luck , keep us posted.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Jagat Singh
>> >>>
>> >>> -----------
>> >>> Sent from Mobile , short and crisp.
>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ram.nath241...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>>>
>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>>>
>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>>>
>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>>>
>> >>>>    Later, I have to collect and convert them to one format and
>> >>>>
>> >>>>    collect to one location.
>> >>>>
>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>>>
>> >>>>    for this implementation?? Or whole framework should I need
>> >>>>
>> >>>>    to study ??
>> >>>>
>> >>>>    Seeking for guidance,
>> >>>>
>> >>>>    Thank you !!
>>
>


-- 
*Cheers,
Mayur*.

Reply via email to