Cheers guys, I'll try collate this stuff and slap it in a Wiki page so
other folk new the project get a decent idea as to how it differs. I
think where I'm getting confused coming from a BI background is people
just think of ETL and Data Storage, and we're easily distracted when it
comes to the other stuff, unlike the science boffs ;)
Thats the problem with all these Hadoop projects with the mega corps
behind them, they get all the PR :)
Anyway I'll try and fashion something out of it, I'm also messing around
with sample data and the OODT stack to gain a better idea, but like any
of these systems, its hard when you don't have a real usecase for it.
Tom
On 03/11/13 17:11, Lewis John Mcgibbney wrote:
Yeah exactly... that's what I meant to say ;)
On Sun, Nov 3, 2013 at 4:07 PM, Chris Mattmann <[email protected]
<mailto:[email protected]>> wrote:
Hey Guys,
Lewis's description is pretty spot on.
Basically Apache Hadoop is a kernel/OS set of capabilities and
functionalities
for workflow processing (used to only be for M/R but now with YARN for
mostly any computational type) and for storage, distributed, highly
available
and replicated (which is needed on low cost unreliable, shared nothing
hardware).
Apache OODT is a data management toolkit and data processing
toolkit, that
can
interoperate and *leverage* Hadoop as one of the capabilities
needed in
building
data systems. It can store data to HDFS (using the File Manager)
either in
standard
ingestion and processing use cases; it can submit jobs to M/R or YARN
style workflows
and use that as the heavy lifter for the workflow processor.
In short, OODT is the code that you normally write over and over again
when building
data systems that combine Hadoop, Oracle, MySQL, WINGS, THREDDS,
Condor,
and Ganglia,
GridFTP or bbFTP, etc. In other words, what you need to build an
end to
end data ingestion
and processing and dissemination system. OODT makes that "glue
code" very
easy to configure
and write (via XML and configuration policy/architecture) and
provides a
repeatable, and
easily discernible way to build these systems.
HTH!
Cheers,
Chris
-----Original Message-----
From: Tom Barber <[email protected]
<mailto:[email protected]>>
Reply-To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Date: Friday, November 1, 2013 1:09 AM
To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Subject: Hadoop Similarities
>
>
>
>Morning,
>
>Chris will remember a couple of years ago me asking on IRC about
how OODT
>differs from Hadoop in terms of features and functionality, which
he then
>gave a great page long explanation as to what the differences were. I
>vowed to copy that information off and
> save it somewhere useful, and of course never did, then I asked
Sean who
>also couldn't dig it up.
>
>So, fine folks of the OODT community, for a novice like me who
would be
>interested in "selling" OODT to users if the correct usecase came
along,
>when someone says "Isn't OODT just a different type of Hadoop?"
what do I
>answer?
>
>I'd like to document this type of comparison stuff on the Wiki as
well as
>I think its useful for people to know and understand.
>
>Cheers
>
>Tom
>
>--
>Tom Barber | Technical Director
>
>meteorite bi
>T: +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>W: www.meteorite.bi <http://www.meteorite.bi>
<http://www.meteorite.bi> |
>Skype: meteorite.consulting
>A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2
7YG, UK
--
/Lewis/
--
*Tom Barber* | Technical Director
meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK