Hello,

     Now I am slowly understanding Hadoop working.

     As I want to collect the logs from three machines

     including Master itself . My small query is

     which mode should I implement for this??

   -      Standalone Operation
   -      Pseudo-Distributed Operation
   -      Fully-Distributed Operation

     Seeking for guidance,

     Thank you !!
*--
Cheers,
Mayur*




Hi mayur,
>
> Flume is used for data collection. Pig is used for data processing.
> For eg, if you have a bunch of servers that you want to collect the
> logs from and push to HDFS - you would use flume. Now if you need to
> run some analysis on that data, you could use pig to do that.
>
> Sent from my iPhone
>
> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ram.nath241...@gmail.com> wrote:
>
> > Hello,
> >
> >   I just read about Pig
> >
> >> Pig
> >> A data flow language and execution environment for exploring very
> > large datasets.
> >> Pig runs on HDFS and MapReduce clusters.
> >
> >   What the actual difference between Pig and Flume makes in logs
> clustering??
> >
> >   Thank you !!
> > --
> > Cheers,
> > Mayur.
> >
> >
> >
> >> Hey Mayur,
> >>>
> >>> If you are collecting logs from multiple servers then you can use flume
> >>> for the same.
> >>>
> >>> if the contents of the logs are different in format  then you can just
> >>> use
> >>> textfileinput format to read and write into any other format you want
> for
> >>> your processing in later part of your projects
> >>>
> >>> first thing you need to learn is how to setup hadoop
> >>> then you can try writing sample hadoop mapreduce jobs to read from text
> >>> file and then process them and write the results into another file
> >>> then you can integrate flume as your log collection mechanism
> >>> once you get hold on the system then you can decide more on which paths
> >>> you want to follow based on your requirements for storage, compute
> time,
> >>> compute capacity, compression etc
> >>>
> >> --------------
> >> --------------
> >>
> >>> Hi,
> >>>
> >>> Please read basics on how hadoop works.
> >>>
> >>> Then start your hands on with map reduce coding.
> >>>
> >>> The tool which has been made for you is flume , but don't see tool till
> >>> you complete above two steps.
> >>>
> >>> Good luck , keep us posted.
> >>>
> >>> Regards,
> >>>
> >>> Jagat Singh
> >>>
> >>> -----------
> >>> Sent from Mobile , short and crisp.
> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ram.nath241...@gmail.com>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
> >>>>
> >>>>    have to use hadoop for Map-reduce. It is such that I am going
> >>>>
> >>>>    to collect logs from 2-3 machines having different locations.
> >>>>
> >>>>    The logs are also in different formats such as .rtf .log .txt
> >>>>
> >>>>    Later, I have to collect and convert them to one format and
> >>>>
> >>>>    collect to one location.
> >>>>
> >>>>    So I am asking which module of Hadoop that I need to study
> >>>>
> >>>>    for this implementation?? Or whole framework should I need
> >>>>
> >>>>    to study ??
> >>>>
> >>>>    Seeking for guidance,
> >>>>
> >>>>    Thank you !!
>

Reply via email to