Re: Ignore subdirectories when querying external table

2011-08-29 Thread Sam William
Dave, Where do you specify the classpath before starting the Hive shell , when you introduce a custom class like this ? Sam On Aug 19, 2011, at 1:22 PM, Dave wrote: I solved my own problem. For anyone who's curious: It turns out that subclassing an InputFormat allows one to override

Re: Ignore subdirectories when querying external table

2011-08-19 Thread Dave
I solved my own problem. For anyone who's curious: It turns out that subclassing an InputFormat allows one to override the listStatus method, which returns the list of files for Hive (or mapreduce in general) to process. All I had to do was subclass org.apache.hadoop.mapred.TextInputFormat and

Re: Ignore subdirectories when querying external table

2011-08-19 Thread Sam William
On similar lines, I want to have hive inlcude subdirs. That is.. I have an external table paritioned by month (data for each month under a folder). Under the current month I want to keep adding folders daily . Is this possible without having to subclass InputFormat ? On Aug 19,

Ignore subdirectories when querying external table

2011-08-18 Thread Dave
Hi, I have a partitioned external table in Hive, and in the partition directories there are other subdirectories that are not related to the table itself. Hive seems to want to scan those directories, as I am getting an error message when trying to do a SELECT on the table: Failed with exception