RE: Loading data from ranges of ordered subdirs

2013-06-10 Thread Rodrick Megraw
a repeated basis and with many data sets. I will read up on it for sure. > Date: Mon, 10 Jun 2013 17:02:37 -0400 > Subject: Re: Loading data from ranges of ordered subdirs > From: pradeep...@gmail.com > To: user@pig.apache.org > > There's two possibilites that come to mind

Re: Loading data from ranges of ordered subdirs

2013-06-10 Thread Pradeep Gollakota
There's two possibilites that come to mind. 1. Write a custom LoadFunc in which you can handle these regular expressions. *Not the most ideal solution* 2. Use HCatalog. The example they have in their documentation seems to fit your use case perfectly. (http://incubator.apache.org/hcatalog/docs/r0.

Loading data from ranges of ordered subdirs

2013-06-10 Thread Rodrick Megraw
Let's say I have my input data from the past 12 months organized into subdirs by date: /data/2012-06-10 /data/2012-06-11 ... /data/2013-06-09 And now say that I want to run a Pig script to process data from a range of dates within the last 12 months, say 2012-11-07 through 2013-05-26. The regex