Depending on your use case and whether or not you will continue adding data to 
this table on an ongoing basis,
I would probably either JSON encode all of the data, and drop everything into a 
single column in hive, and then use JSON UDF to create a hive view to map the 
columns you care about, or I would create a single hive table with ALL of the 
columns from ALL of the files, and then only populate the columns that are 
relevant for a given file….



From: Nishanth S [mailto:nishanth.2...@gmail.com]
Sent: Friday, June 02, 2017 11:15 AM
To: user@hive.apache.org
Subject: Re: Migrating Variable Length Files to Hive

[External Email]
________________________________
Thanks Ryan. In  my case I have around 200 small files in mainframe . Columns 
are same within a file but  vary  in number across files .Now I need to get all 
these data into a single hive table . The first three columns are  standard in 
case of all files .Any idea how the schema would look  if I use the stingray 
reader?.I am guessing it would be more like 
string,string,string,array(strings)?.

-Nishanth

On Fri, Jun 2, 2017 at 10:51 AM, Ryan Harris 
<ryan.har...@zionsbancorp.com<mailto:ryan.har...@zionsbancorp.com>> wrote:
I wrote some custom python parsing scripts using StingRay Reader ( 
http://stingrayreader.sourceforge.net/cobol.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__stingrayreader.sourceforge.net_cobol.html&d=DwMFaQ&c=9WYoWBgz3TbmQlstBqb6LDRA8PY_DPmoAS0YWoTLU-g&r=_W3sXrqd7teXL8R6ey10dgFH1GT5KbehFX_EaUG41XM&m=Ls86rSbGN-nySFfqbXFNGMOeRCmz5K3hS_XJc85ayd4&s=C3TE2dzCdLkdjx-ac05Mp27CKqGz_5AblD6KAZ-hPqg&e=>
 ) that read in the copybooks and use the results to automatically generate 
hive table schema based on the source copybook.  The EBCDIC data is then 
extracted to TAB separated ASCII values to load to Hive.
Some tables had some very sparse column values, so in those cases, I bundled 
the sparse data into a catch-all JSON field in the Hive table.

The parser is able to handle both fixed-length records as well as 
variable-length VB-type records.

Let me know if you have any questions regarding Stingray….

From: Nishanth S 
[mailto:nishanth.2...@gmail.com<mailto:nishanth.2...@gmail.com>]
Sent: Friday, June 02, 2017 10:07 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Migrating Variable Length Files to Hive

[External Email]
________________________________
Hello hive users,

We are looking at migrating  files(less than 5 Mb of data in total) with 
variable record lengths from a mainframe system to hive.You could think of this 
as metadata.Each of these records can have columns  ranging from 3 to  n( means 
 each record type have different number of columns) based on record type.What 
would be the best strategy to migrate this  to hive .I was thinking of 
converting these files  into one  variable length csv file and then importing 
them to a hive table .Hive table will consist of 4 columns with the 4th column 
having comma separated list of  values from column column 4 to n.Are there 
other alternative or better approaches for this solution.Appreciate any  
feedback on this.

Thanks,
Nishanth
________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately. Thank you.


======================================================================
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately.  Thank you.

Reply via email to