Thanks for that Lewis, very useful. Indeed my question was never
designed to be a pro vs con's comparison, I'm just interested to know
where people see the differences as Hadoop clearly rules the roost in
"Big Data" stuff.
My background is in Business Intelligence and so I come into contact
with plenty of Hadoop + Map Reduce PR daily and you end up swamped with
that stuff (not that I've found much Hadoop in the wild, just Press
fodder). I'm interested because people clearly see a hole in the Hadoop
eco system that allows a gap in the market for the OODT setup, and
should that use case arise I'd like to make sure I'm choosing the
correct tool for the job.
Cheers
Tom
On 03/11/13 14:27, Lewis John Mcgibbney wrote:
Hi Tom,
On Fri, Nov 1, 2013 at 8:09 AM, Tom Barber <[email protected]
<mailto:[email protected]>> wrote:
Morning,
Chris will remember a couple of years ago me asking on IRC about
how OODT differs from Hadoop in terms of features and
functionality, which he then gave a great page long explanation as
to what the differences were. I vowed to copy that information off
and save it somewhere useful, and of course never did, then I
asked Sean who also couldn't dig it up.
What a shame. Would have been great to at least see this if not get it
documented as you mention. Oh well. Community lists are as good as
it's get IMHO so here we go.
So, fine folks of the OODT community, for a novice like me who
would be interested in "selling" OODT to users if the correct
usecase came along, when someone says "Isn't OODT just a different
type of Hadoop?" what do I answer?
I am relatively new to OODT. My opinion here is pretty abstract
however I have been using Hadoop much longer and therefore hope that
some of what I'm saying contributes to our shared understanding.
OODT
=====
I was attracted to OODT due to the modular, component-oriented design
of the project as a whole. It is down to the system designer (the
initial person/team who pick up OODT) to review and select which
aspects of the overall project they need to select to satisfy and
accommodate their data work-flow(s). Due to the modular nature of the
project, components can be substituted as the nature and/or
characteristics of the data work-flow change over time. A beautiful
aspect of OODT is that many tools and instruments have been built to
accommodate the above-mentioned requirements for data work-flows.
Hadoop
======
For me, Hadoop (something which I consider a blanket term for what is
essentially an OS) is an operating system as oppose to OODT which I've
described as a modularized data workflow platform. It provides a
filesystem (HDFS), data processing platform (MapReduce), and API
through which we can submit and execute jobs. Additionally we all know
about the bolt on's such as workflow monitoring, security and so
forth. In this respect it is down to the engineer to build the data
workflow around/on-top of Hadoop given the available components
provided. One thing which I think characterizes Hadoop here as well is
the fact that generally speaking data follows a 'write-once read many'
logic whereas this is not necessarily the case with OODT.
I'd like to document this type of comparison stuff on the Wiki as
well as I think its useful for people to know and understand.
I'm sure that the above is obvious to many and that I'm merely
mentioning material from the immediate surroundings, however this is
my experience so far using OODT and the comparisons I can draw myself.
When i started responding, it was not my aim to engage in a pro's vs
con's of each piece of software so I hope the brief replay as above
can act as a contribution to the conversation and we can take this
onwards.
Thanks
Lewis
--
*Tom Barber* | Technical Director
meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK