Re: Hadoop Similarities

Tom Barber Sun, 03 Nov 2013 09:17:53 -0800

Cheers guys, I'll try collate this stuff and slap it in a Wiki page soother folk new the project get a decent idea as to how it differs. Ithink where I'm getting confused coming from a BI background is peoplejust think of ETL and Data Storage, and we're easily distracted when itcomes to the other stuff, unlike the science boffs ;)

Thats the problem with all these Hadoop projects with the mega corpsbehind them, they get all the PR :)

Anyway I'll try and fashion something out of it, I'm also messing aroundwith sample data and the OODT stack to gain a better idea, but like anyof these systems, its hard when you don't have a real usecase for it.


Tom


On 03/11/13 17:11, Lewis John Mcgibbney wrote:

Yeah exactly... that's what I meant to say ;)

On Sun, Nov 3, 2013 at 4:07 PM, Chris Mattmann <[email protected]<mailto:[email protected]>> wrote:


    Hey Guys,

    Lewis's description is pretty spot on.

    Basically Apache Hadoop is a kernel/OS set of capabilities and
    functionalities
    for workflow processing (used to only be for M/R but now with YARN for
    mostly any computational type) and for storage, distributed, highly
    available
    and replicated (which is needed on low cost unreliable, shared nothing
    hardware).

    Apache OODT is a data management toolkit and data processing
    toolkit, that
    can
    interoperate and *leverage* Hadoop as one of the capabilities
    needed in
    building
    data systems. It can store data to HDFS (using the File Manager)
    either in
    standard
    ingestion and processing use cases; it can submit jobs to M/R or YARN
    style workflows
    and use that as the heavy lifter for the workflow processor.

    In short, OODT is the code that you normally write over and over again
    when building
    data systems that combine Hadoop, Oracle, MySQL, WINGS, THREDDS,
    Condor,
    and Ganglia,
    GridFTP or bbFTP, etc. In other words, what you need to build an
    end to
    end data ingestion
    and processing and dissemination system. OODT makes that "glue
    code" very
    easy to configure
    and write (via XML and configuration policy/architecture) and
    provides a
    repeatable, and
    easily discernible way to build these systems.

    HTH!

    Cheers,
    Chris




    -----Original Message-----
    From: Tom Barber <[email protected]
    <mailto:[email protected]>>
    Reply-To: "[email protected] <mailto:[email protected]>"
    <[email protected] <mailto:[email protected]>>
    Date: Friday, November 1, 2013 1:09 AM
    To: "[email protected] <mailto:[email protected]>"
    <[email protected] <mailto:[email protected]>>
    Subject: Hadoop Similarities

    >
    >
    >
    >Morning,
    >
    >Chris will remember a couple of years ago me asking on IRC about
    how OODT
    >differs from Hadoop in terms of features and functionality, which
    he then
    >gave a great page long explanation as to what the differences were. I
    >vowed to copy that information off and
    > save it somewhere useful, and of course never did, then I asked
    Sean who
    >also couldn't dig it up.
    >
    >So, fine folks of the OODT community, for a novice like me who
    would be
    >interested in "selling" OODT to users if the correct usecase came
    along,
    >when someone says "Isn't OODT just a different type of Hadoop?"
    what do I
    >answer?
    >
    >I'd like to document this type of comparison stuff on the Wiki as
    well as
    >I think its useful for people to know and understand.
    >
    >Cheers
    >
    >Tom
    >
    >--
    >Tom Barber | Technical Director
    >
    >meteorite bi
    >T: +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
    >W: www.meteorite.bi <http://www.meteorite.bi>
    <http://www.meteorite.bi> |
    >Skype: meteorite.consulting
    >A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2
    7YG, UK





--
/Lewis/



--
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Hadoop Similarities

Reply via email to