Hi Matthieu, as Sqoop is import/exporting data as they are, using natural keys is not usually working as expected. You should prepare your data with proper surrogate keys and use Sqoop as a bulk load tool to get all your data from/to database.
Jarcec On Mon, May 20, 2013 at 09:27:52AM -0700, Matthieu Labour wrote: > Jarcec > > Thank you for your response > > May I ask you a tip on best practices with respect to keys. When preparing > the data for sqoop do people end up using natural keys? Or do folks > generate surrogate keys while preparing the data? > > As an example: > > instead of having > user: naturalUserIdentifier, address .... > product: productRef, productProperties... > user_product: natural user Identifier, productRef > > I would like > user: userSurrogateKey, userNaturaIdentifier, address .... > product: productSurrogateKey, productRef, productProperties... > user_product: userSurrogateKey, productSurrogateKey > > Thanks a lot > -matt > > > On Sun, May 19, 2013 at 11:37 PM, Jarek Jarcec Cecho <[email protected]>wrote: > > > Hi Matthieu, > > Sqoop is currently highly specialized EL tool (extract-load) and not a > > generic ETL tool (extract-transform-load). Thus you need to execute custom > > mapreduce/pig/hive job that will separate all three different logical > > tables and prepare data into format that Sqoop can process. > > > > Jarcec > > > > On Fri, May 17, 2013 at 05:44:03PM -0400, Matthieu Labour wrote: > > > Hi > > > > > > I would be grateful for any tips on how to "prepare" the data so they can > > > be exported to a Postgesql Database using sqoop. > > > > > > As an example: > > > > > > Provided some files of events. (user events, product events, > > > productActivity events) > > > > > > [file0001] > > > event:user propertes:{name:"john" ...} > > > event:product properties:{ref:123,color:"blue",... > > > event:productActivity properties:{user:"john", product:"ref", > > action:"buy"} > > > ..... > > > > > > How does one come up with the primary keys and the user_product join > > table > > > ready to be exported? > > > > > > On other words. > > > > > > function(Input:eventfile) => output:[productFile, userFile, > > > user_productFile with auto generated primary keys ] > > > > > > what goes into function? > > > > > > I hope this makes sense! > > > > > > Thank you in advance for any help > > > > > > -matt > >
signature.asc
Description: Digital signature
