Hi Mayur, Take a look here: http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed
"Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process." = SingleNode. So you can only use the Fully-Distributed mode. JM 2013/3/8 Mayur Patil <[email protected]>: > Hello, > > Thank you sir for your favorable reply. > > I am going to use 1master and 2 worker > > nodes ; totally 3 nodes. > > > Thank you !! > > -- > Cheers, > Mayur > > On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari > <[email protected]> wrote: >> >> Hi Mayur, >> >> Those 3 modes are 3 differents ways to use Hadoop, however, the only >> production mode here is the fully distributed one. The 2 others are >> more for local testing. How many nodes are you expecting to use hadoop >> on? >> >> JM >> >> >> 2013/3/7 Mayur Patil <[email protected]>: >> > Hello, >> > >> > Now I am slowly understanding Hadoop working. >> > >> > As I want to collect the logs from three machines >> > >> > including Master itself . My small query is >> > >> > which mode should I implement for this?? >> > >> > Standalone Operation >> > Pseudo-Distributed Operation >> > Fully-Distributed Operation >> > >> > Seeking for guidance, >> > >> > Thank you !! >> > -- >> > Cheers, >> > Mayur >> > >> > >> > >> > >> >>> Hi mayur, >> >>> >> >>> Flume is used for data collection. Pig is used for data processing. >> >>> For eg, if you have a bunch of servers that you want to collect the >> >>> logs from and push to HDFS - you would use flume. Now if you need to >> >>> run some analysis on that data, you could use pig to do that. >> >>> >> >>> Sent from my iPhone >> >>> >> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <[email protected]> >> >>> wrote: >> >>> >> >>> > Hello, >> >>> > >> >>> > I just read about Pig >> >>> > >> >>> >> Pig >> >>> >> A data flow language and execution environment for exploring very >> >>> > large datasets. >> >>> >> Pig runs on HDFS and MapReduce clusters. >> >>> > >> >>> > What the actual difference between Pig and Flume makes in logs >> >>> > clustering?? >> >>> > >> >>> > Thank you !! >> >>> > -- >> >>> > Cheers, >> >>> > Mayur. >> >>> > >> >>> > >> >>> > >> >>> >> Hey Mayur, >> >>> >>> >> >>> >>> If you are collecting logs from multiple servers then you can use >> >>> >>> flume >> >>> >>> for the same. >> >>> >>> >> >>> >>> if the contents of the logs are different in format then you can >> >>> >>> just >> >>> >>> use >> >>> >>> textfileinput format to read and write into any other format you >> >>> >>> want >> >>> >>> for >> >>> >>> your processing in later part of your projects >> >>> >>> >> >>> >>> first thing you need to learn is how to setup hadoop >> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from >> >>> >>> text >> >>> >>> file and then process them and write the results into another file >> >>> >>> then you can integrate flume as your log collection mechanism >> >>> >>> once you get hold on the system then you can decide more on which >> >>> >>> paths >> >>> >>> you want to follow based on your requirements for storage, compute >> >>> >>> time, >> >>> >>> compute capacity, compression etc >> >>> >>> >> >>> >> -------------- >> >>> >> -------------- >> >>> >> >> >>> >>> Hi, >> >>> >>> >> >>> >>> Please read basics on how hadoop works. >> >>> >>> >> >>> >>> Then start your hands on with map reduce coding. >> >>> >>> >> >>> >>> The tool which has been made for you is flume , but don't see tool >> >>> >>> till >> >>> >>> you complete above two steps. >> >>> >>> >> >>> >>> Good luck , keep us posted. >> >>> >>> >> >>> >>> Regards, >> >>> >>> >> >>> >>> Jagat Singh >> >>> >>> >> >>> >>> ----------- >> >>> >>> Sent from Mobile , short and crisp. >> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[email protected]> >> >>> >>> wrote: >> >>> >>> >> >>> >>>> Hello, >> >>> >>>> >> >>> >>>> I am new to Hadoop. I am doing a project in cloud in which I >> >>> >>>> >> >>> >>>> have to use hadoop for Map-reduce. It is such that I am going >> >>> >>>> >> >>> >>>> to collect logs from 2-3 machines having different locations. >> >>> >>>> >> >>> >>>> The logs are also in different formats such as .rtf .log .txt >> >>> >>>> >> >>> >>>> Later, I have to collect and convert them to one format and >> >>> >>>> >> >>> >>>> collect to one location. >> >>> >>>> >> >>> >>>> So I am asking which module of Hadoop that I need to study >> >>> >>>> >> >>> >>>> for this implementation?? Or whole framework should I need >> >>> >>>> >> >>> >>>> to study ?? >> >>> >>>> >> >>> >>>> Seeking for guidance, >> >>> >>>> >> >>> >>>> Thank you !! >> > >> > >> > >> > >> > -- >> > Cheers, >> > Mayur. > > > > > -- > Cheers, > Mayur.
