Hello Everyone,

I have the below requirement to do


1)      We have a DB in SQL Server.

2)      EVENTS related transactional data is getting stored in EVENTS table of 
DB. A part of event related information is being generated as an xml (one for 
each event) and is being stored in one of the columns (of type VARBINARY) of 
EVENTS table.

Note: Interesting part here is that all xmls are being zipped with gzip 
compression codec, and only then being stored in one of the columns of EVENTS 
table.

3)      Now that we have decided to SQOOP the data to HDFS from EVENTS table, 
to generate some reports on it, we are totally clueless on how to proceed on 
this requirement.

I have tried SQOOPing the data as it is into HDFS as a default text format. All 
the columns are in human readable format except that VARBINARY column (where 
the zipped xml is stored).

So, can someone help me in this?

Ø  How can I unzip that zipped xml data? Can I do it in a single SQOOP command?

Ø  I have not tried but will it work if I SQOOP to a sequence file?

Please help me...I am struggling to start on this requirement  from past couple 
of days.


Thanks
Suman

Reply via email to