On Mon, Jan 24, 2011 at 5:58 PM, Avram Aelony <avramael...@eharmony.com> wrote: > Hi, > > I really like the virtual column feature in 0.7 that allows me to request > INPUT__FILE__NAME and see the names of files that are being acted on. > > Because I can see the files that are being read, I see that I am spending > time querying many, many very large files, most of which I do not need to > process because these extra files are in the same s3 bucket location that > contains the files I need. > > The files I do need to process only a represent a subset of all files in the > bucket. Nevertheless, the files I am interested in are quite large, and large > enough to make copying to hdfs unwieldy. > > Since I know the files I want to process by name before the scan of all > files, can I be more efficient and only process a selection of files from a > bucket avoiding those I don't? > > > I guess I am still looking for something > likehttps://issues.apache.org/jira/browse/HIVE-951 > I tried sending this message to the dev list initially, but since I haven't > seen a response yet, perhaps this list is more appropriate. > > Any suggestions or update on HIVE-951 ? > > > Thanks, > Avram
We do have the SymLink input format. It is a little more work then hive-951 but accomplishes roughly the same thing. Edward