Hey man, Currently Sqoop doesn't have any ability to unzip a single column. This is a feature that seems fit for Sqoop2 actually. Until then, I'd try using Hive or Impala and write a custom UDF to help with column extraction.
-Abe On Fri, Aug 28, 2015 at 3:40 AM <[email protected]> wrote: > *Hello Everyone,* > > > > I have the below requirement to do > > > > 1) We have a DB in SQL Server. > > 2) *EVENTS* related transactional data is getting stored in *EVENTS* > table of DB. A part of event related information is being generated as an > *xml* (one for each event) and is being stored in one of the columns (of > type *VARBINARY*) of *EVENTS* table. > > *Note:* Interesting part here is that all xmls are being zipped with *gzip > *compression codec, and only then being stored in one of the columns of > *EVENTS > *table. > > 3) Now that we have decided to *SQOOP* the data to *HDFS* from > *EVENTS* table, to generate some reports on it, we are totally clueless > on how to proceed on this requirement. > > > > I have tried SQOOPing the data as it is into HDFS as a default text > format. All the columns are in human readable format except that VARBINARY > column (where the zipped xml is stored). > > > > So, can someone help me in this? > > Ø How can I unzip that zipped xml data? Can I do it in a single SQOOP > command? > > Ø I have not tried but will it work if I SQOOP to a *sequence* file? > > > > *Please help me…I am struggling to start on this requirement from past > couple of days.* > > > > > > *Thanks* > > *Suman* > > >
