loading datafiles in s3

Kennon Lee Mon, 27 Jun 2011 17:50:49 -0700

Hello,
We're using hive on amazon elastic mapreduce to process logs on s3, and I
had a couple basic questions. Apologies if they've been answered already-- I
gathered most info from the hive tutorial on amazon (
http://aws.amazon.com/articles/2855), as well as from skimming the hive wiki
pages, but I'm still very new to all of this. So, questions:


1) Is it possible to partition on directories that do not have the "key="
prefix? Our logs are organized like s3://bucketname/dir/YYYY/MM/DD/HH/*.bz2
and so ideally we could partition on that structure instead of adding "dt="
to every directory name. I found an old thread discussing this (
http://search-hadoop.com/m/SGTqLox5Il/partition+directory/v=threaded<http://search-hadoop.com/m/SGTqLox5Il/partition+directory/v=threaded)>)
but couldnt find the actual syntax.

2) How does hive handle tab-delimited files where rows sometimes have
different column counts? For instance, if we are parsing an event log that
contains multiple events, some of which have more columns associated with
them:

event_a user_id apple 300
event_b user_id cat

If i define my hive table to have 4 columns, how will hive react to the
event_b row?

Thanks!

loading datafiles in s3

Reply via email to