Re: Load Dataset and Instances from database

Ted Dunning Fri, 25 Nov 2011 12:05:59 -0800

On Fri, Nov 25, 2011 at 4:46 AM, Isabel Drost <isa...@apache.org> wrote:

> On 24.11.2011 Ted Dunning wrote:
> > Actually, one of the most reliable ways to kill a database is to use it
> as
> > input or output for even a small Hadoop cluster.  Having hundreds of
> > processes all open connections and read at once is fairly abusive.
>
> Though that does not mean that data cannot by synced to hdfs before being
> used
> in a map/reduce job. Tools like sqoop help with that.

Absolutely.  This is distinctly best practice.  I should add that sqoop is
only one way.  Another is to dump to flat file and copy that into the
cluster.  Still another is to dump the data from the data base into an NFS
mount of a MapR cluster file system.

It is just sucking down the data in the mappers that is evil.

Re: Load Dataset and Instances from database

Reply via email to