Wrapping hive around existing csv files consists of manually naming and typing 
every column during the creation command.  I have several csv tables and some 
of them have a ton of columns.  I would love a way to create hive tables which 
automatically infers the column types by attempting various type conversions or 
regex matches on the data (say the first row).  What would be even cooler is if 
the first row could actually be interpreted differently from the rest of the 
table...as a set of string labels to name the columns while the types could be 
automatically inferred from, say, the *second* row.  These csv files are 
currently of this format, with the first row naming the columns.

Does this make sense?

Now, I'm sure that hive doesn't support this yet -- and I admit it is a 
somewhat esoteric desire on my part -- but I'm curious how others would suggest 
approaching it?  I'm thinking of writing a separate isolated program that reads 
the first two rows of a csv file and dumps a text string of column names and 
types in the correct syntax for a hive external table creation statement which 
I would then copy/paste into hive...I was just hoping for a simpler solution.

Thoughts?

Thanks.

________________________________________________________________________________
Keith Wiley     kwi...@keithwiley.com     keithwiley.com    music.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
                                           --  Keith Wiley
________________________________________________________________________________

Reply via email to