Hello, Now I am slowly understanding Hadoop working.
As I want to collect the logs from three machines including Master itself . My small query is which mode should I implement for this?? - Standalone Operation - Pseudo-Distributed Operation - Fully-Distributed Operation Seeking for guidance, Thank you !! *-- Cheers, Mayur* Hi mayur, > > Flume is used for data collection. Pig is used for data processing. > For eg, if you have a bunch of servers that you want to collect the > logs from and push to HDFS - you would use flume. Now if you need to > run some analysis on that data, you could use pig to do that. > > Sent from my iPhone > > On Feb 14, 2013, at 1:39 AM, Mayur Patil <ram.nath241...@gmail.com> wrote: > > > Hello, > > > > I just read about Pig > > > >> Pig > >> A data flow language and execution environment for exploring very > > large datasets. > >> Pig runs on HDFS and MapReduce clusters. > > > > What the actual difference between Pig and Flume makes in logs > clustering?? > > > > Thank you !! > > -- > > Cheers, > > Mayur. > > > > > > > >> Hey Mayur, > >>> > >>> If you are collecting logs from multiple servers then you can use flume > >>> for the same. > >>> > >>> if the contents of the logs are different in format then you can just > >>> use > >>> textfileinput format to read and write into any other format you want > for > >>> your processing in later part of your projects > >>> > >>> first thing you need to learn is how to setup hadoop > >>> then you can try writing sample hadoop mapreduce jobs to read from text > >>> file and then process them and write the results into another file > >>> then you can integrate flume as your log collection mechanism > >>> once you get hold on the system then you can decide more on which paths > >>> you want to follow based on your requirements for storage, compute > time, > >>> compute capacity, compression etc > >>> > >> -------------- > >> -------------- > >> > >>> Hi, > >>> > >>> Please read basics on how hadoop works. > >>> > >>> Then start your hands on with map reduce coding. > >>> > >>> The tool which has been made for you is flume , but don't see tool till > >>> you complete above two steps. > >>> > >>> Good luck , keep us posted. > >>> > >>> Regards, > >>> > >>> Jagat Singh > >>> > >>> ----------- > >>> Sent from Mobile , short and crisp. > >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ram.nath241...@gmail.com> > wrote: > >>> > >>>> Hello, > >>>> > >>>> I am new to Hadoop. I am doing a project in cloud in which I > >>>> > >>>> have to use hadoop for Map-reduce. It is such that I am going > >>>> > >>>> to collect logs from 2-3 machines having different locations. > >>>> > >>>> The logs are also in different formats such as .rtf .log .txt > >>>> > >>>> Later, I have to collect and convert them to one format and > >>>> > >>>> collect to one location. > >>>> > >>>> So I am asking which module of Hadoop that I need to study > >>>> > >>>> for this implementation?? Or whole framework should I need > >>>> > >>>> to study ?? > >>>> > >>>> Seeking for guidance, > >>>> > >>>> Thank you !! >