Re: [Hadoop-Help]About Map-Reduce implementation

Jean-Marc Spaggiari Fri, 08 Mar 2013 05:17:42 -0800

Hi Mayur,

Take a look here:
http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed


"Hadoop can also be run on a single-node in a pseudo-distributed mode
where each Hadoop daemon runs in a separate Java process." =
SingleNode.

So you can only use the Fully-Distributed mode.

JM

2013/3/8 Mayur Patil <[email protected]>:
> Hello,
>
>   Thank you sir for your favorable reply.
>
>   I am going to use 1master and 2 worker
>
>   nodes ; totally 3 nodes.
>
>
>   Thank you !!
>
> --
> Cheers,
> Mayur
>
> On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari
> <[email protected]> wrote:
>>
>> Hi Mayur,
>>
>> Those 3 modes are 3 differents ways to use Hadoop, however, the only
>> production mode here is the fully distributed one. The 2 others are
>> more for local testing. How many nodes are you expecting to use hadoop
>> on?
>>
>> JM
>>
>>
>> 2013/3/7 Mayur Patil <[email protected]>:
>> > Hello,
>> >
>> >    Now I am slowly understanding Hadoop working.
>> >
>> >   As I want to collect the logs from three machines
>> >
>> >   including Master itself . My small query is
>> >
>> >   which mode should I implement for this??
>> >
>> >                   Standalone Operation
>> >                   Pseudo-Distributed Operation
>> >                   Fully-Distributed Operation
>> >
>> >      Seeking for guidance,
>> >
>> >      Thank you !!
>> > --
>> > Cheers,
>> > Mayur
>> >
>> >
>> >
>> >
>> >>> Hi mayur,
>> >>>
>> >>> Flume is used for data collection. Pig is used for data processing.
>> >>> For eg, if you have a bunch of servers that you want to collect the
>> >>> logs from and push to HDFS - you would use flume. Now if you need to
>> >>> run some analysis on that data, you could use pig to do that.
>> >>>
>> >>> Sent from my iPhone
>> >>>
>> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <[email protected]>
>> >>> wrote:
>> >>>
>> >>> > Hello,
>> >>> >
>> >>> >   I just read about Pig
>> >>> >
>> >>> >> Pig
>> >>> >> A data flow language and execution environment for exploring very
>> >>> > large datasets.
>> >>> >> Pig runs on HDFS and MapReduce clusters.
>> >>> >
>> >>> >   What the actual difference between Pig and Flume makes in logs
>> >>> > clustering??
>> >>> >
>> >>> >   Thank you !!
>> >>> > --
>> >>> > Cheers,
>> >>> > Mayur.
>> >>> >
>> >>> >
>> >>> >
>> >>> >> Hey Mayur,
>> >>> >>>
>> >>> >>> If you are collecting logs from multiple servers then you can use
>> >>> >>> flume
>> >>> >>> for the same.
>> >>> >>>
>> >>> >>> if the contents of the logs are different in format  then you can
>> >>> >>> just
>> >>> >>> use
>> >>> >>> textfileinput format to read and write into any other format you
>> >>> >>> want
>> >>> >>> for
>> >>> >>> your processing in later part of your projects
>> >>> >>>
>> >>> >>> first thing you need to learn is how to setup hadoop
>> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> >>> >>> text
>> >>> >>> file and then process them and write the results into another file
>> >>> >>> then you can integrate flume as your log collection mechanism
>> >>> >>> once you get hold on the system then you can decide more on which
>> >>> >>> paths
>> >>> >>> you want to follow based on your requirements for storage, compute
>> >>> >>> time,
>> >>> >>> compute capacity, compression etc
>> >>> >>>
>> >>> >> --------------
>> >>> >> --------------
>> >>> >>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> Please read basics on how hadoop works.
>> >>> >>>
>> >>> >>> Then start your hands on with map reduce coding.
>> >>> >>>
>> >>> >>> The tool which has been made for you is flume , but don't see tool
>> >>> >>> till
>> >>> >>> you complete above two steps.
>> >>> >>>
>> >>> >>> Good luck , keep us posted.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>
>> >>> >>> Jagat Singh
>> >>> >>>
>> >>> >>> -----------
>> >>> >>> Sent from Mobile , short and crisp.
>> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[email protected]>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>>> Hello,
>> >>> >>>>
>> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>> >>>>
>> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>> >>>>
>> >>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>> >>>>
>> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>> >>>>
>> >>> >>>>    Later, I have to collect and convert them to one format and
>> >>> >>>>
>> >>> >>>>    collect to one location.
>> >>> >>>>
>> >>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>> >>>>
>> >>> >>>>    for this implementation?? Or whole framework should I need
>> >>> >>>>
>> >>> >>>>    to study ??
>> >>> >>>>
>> >>> >>>>    Seeking for guidance,
>> >>> >>>>
>> >>> >>>>    Thank you !!
>> >
>> >
>> >
>> >
>> > --
>> > Cheers,
>> > Mayur.
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Reply via email to