Hello, I just read about Pig
> Pig > A data flow language and execution environment for exploring very large datasets. > Pig runs on HDFS and MapReduce clusters. What the actual difference between Pig and Flume makes in logs clustering?? Thank you !! -- Cheers, Mayur. > Thanks to you duo. You solved my problem so easily. I want to > > ask one more question; for reference. I have > > 1. hadoop the definitive guide > 2. Hadoop In Action > > Is it sufficient or do I need some more material to study > > your suggested implementation?? > * > -- > Cheers, > Mayur* > > Hey Mayur, >> >> If you are collecting logs from multiple servers then you can use flume >> for the same. >> >> if the contents of the logs are different in format then you can just >> use >> textfileinput format to read and write into any other format you want for >> your processing in later part of your projects >> >> first thing you need to learn is how to setup hadoop >> then you can try writing sample hadoop mapreduce jobs to read from text >> file and then process them and write the results into another file >> then you can integrate flume as your log collection mechanism >> once you get hold on the system then you can decide more on which paths >> you want to follow based on your requirements for storage, compute time, >> compute capacity, compression etc >> > -------------- > -------------- > >> Hi, >> >> Please read basics on how hadoop works. >> >> Then start your hands on with map reduce coding. >> >> The tool which has been made for you is flume , but don't see tool till >> you complete above two steps. >> >> Good luck , keep us posted. >> >> Regards, >> >> Jagat Singh >> >> ----------- >> Sent from Mobile , short and crisp. >> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[email protected]> wrote: >> >>> Hello, >>> >>> I am new to Hadoop. I am doing a project in cloud in which I >>> >>> have to use hadoop for Map-reduce. It is such that I am going >>> >>> to collect logs from 2-3 machines having different locations. >>> >>> The logs are also in different formats such as .rtf .log .txt >>> >>> Later, I have to collect and convert them to one format and >>> >>> collect to one location. >>> >>> So I am asking which module of Hadoop that I need to study >>> >>> for this implementation?? Or whole framework should I need >>> >>> to study ?? >>> >>> Seeking for guidance, >>> >>> Thank you !! >>> -- >>> *Cheers,* >>> *Mayur.* >>> >> > -- *Cheers, Mayur*.
