Allan: Do you program - by any chance - in Python?
- Sean Sean Franks | (212) 284-8787 | (908) 310-4200 “With the addition of a well structured Big Data ecosystem to the Data Highway of an enterprise, Business Intelligence analytics will take a quantum leap forward.” -----Original Message----- From: Gwen Shapira [mailto:[email protected]] Sent: Monday, August 04, 2014 3:39 PM To: [email protected] Subject: Re: sqoop import to S3 hits 5 GB limit Just for completeness - I often configure Sqoop with high number of mappers (so if a tasktracker fails it won't lose huge amounts of work) and then use the fair-scheduler to limit the number of concurrent mappers to something reasonable for the DB. On Mon, Aug 4, 2014 at 12:11 PM, Allan Ortiz <[email protected]> wrote: > > Great! Thanks for the reply Gwen. I did not know that sqoop2 isn't > ready for the prime time yet. For various reasons, I am going to use > the sqoop to HDFS, then copy to S3 option. One reason is that we are > currently doing non-incremental sqoop (so the import time is > significant), and I've observed that the import run-time goes up as > the number of mappers exceeds the number of cores for my source DB. > > Thanks again, > Allan > > ________________________________ > From: "Gwen Shapira" <[email protected]> > To: [email protected] > Sent: Sunday, August 3, 2014 12:07:10 PM > Subject: Re: sqoop import to S3 hits 5 GB limit > > > Hi, > > Sqoop2 is rather experimental and will not solve your problem. > > I'd try to work-around the issue by increasing number of mappers until > each mapper is writing less than 5GB worth of data. > > If this doesn't work for you, then HDFS->S3 is an option. > > Gwen > > On Thu, Jul 31, 2014 at 2:32 PM, Allan Ortiz <[email protected]> wrote: >> I am trying to use sqoop 1.4.4 to import data from a mysql DB >> directly to >> S3 >> and I am running into an issue where if one of the file splits is >> larger than 5 GB then the import fails. >> >> Details for this question are listed here in my SO post - I promise >> to follow good cross-posting etiquette :) >> >> http://stackoverflow.com/questions/25068747/sqoop-import-to-s3-hits-5 >> -gb-limit >> >> One of my main questions is should I be using sqoop 2 rather than >> sqoop 1.4.4? Also, should I be sqooping to HDFS, then copying the >> data over to >> S3 >> for permanent storage? Thanks! >> >> >
