Hi Noora,

Welcome to the world of the Hadoop - It is a vast eco system and is quite
daunting at first.

Perhaps if I summarize a few of the key technologies which build on each
other it might help you navigate things:

a) Hadoop DFS - the distributed file system
b) Hadoop MapReduce (MR) - a distributed framework for processing where you
right Maps and Reduces.  It is batch oriented, with 30+ sec latency to
start even the smallest jobs, so not ideally suited to interactive
operations
c) Sqoop is a library that allows you to run MR jobs that either suck data
from a DB to HDFS or vice versa.  It supports a variety of formats, such as
Avro (a data format where the schema is embedded)
d) You didn't mention it but Hive is a SQL layer, that allows to you to run
SQL as MR jobs.  A common use is MySQL -> Sqoop -> HDFS -> Hive
e) HBase - a "big table" technology that allows you to have a column
oriented data stored, and you can GET or PUT by key, or perform limited
operations.

So what is Gora?
Gora is a effectively an Object Relational Mapper, that allows you to
define the table definition using Avro format, and provide a mapping of how
each field is stored against the backend system and then Gora takes care of
CRUD operations and mediation with the backend, without the caller actually
knowing how to use the backend API.  Various backends are supported.  Thus
I can do Person p = new Person("Tim") and then "gora save Tim" - Gora will
then take care of saving my object in (e.g.) HBase.  There are connectors
that allow you to run MR jobs over Gora stores as well.  Gora is similar to
the likes of MyBATIS if you are familiar with that, but support "Hadoop
technologies" as backends, and provides MR capability allowing you to MR
across various backends consistently.

So is gora real time or not - yes it is real time for CRUD, but MR type
jobs are batch operations, with reasonably high latency.
Does gora block? that depends on the backend... With HBase updates for
example, you typically either overwrite, or fail the update on a race
condition, and scans are non blocking.

Perhaps if you explain what you are trying to do, the list can help advise
you if Gora is a suitable option, or could suggest the appropriate Hadoop
list to ask?

I hope this helps,
Tim






On Wed, Apr 30, 2014 at 2:25 PM, Noora <noora.sa...@gmail.com> wrote:

> Hi All,
>
> I want to integrate mysql and hdfs in my hadoop project. I searched a lot
> about different ways, there was two approach: real time using "mysql
> applier for hadoop" and "apache sqoop" for non real time uses.
>
> Then I found that Gora has this ability too but I could not find any
> information about how it works.
>
> Is Gora real time or not? What is the difference between gora and mysql
> applier or sqoop? If realtime, is db process blocking or not?
> For integration of hadoop and mysql, does it need any nosql db as
> interface?
>
> thanx
>

Reply via email to