Re: Way of determining the source of data

2012-02-05 Thread Daniel Dai
Check https://cwiki.apache.org/confluence/display/PIG/FAQ#FAQ-Q%3AIloaddatafromadirectorywhichcontainsdifferentfile.HowdoIfindoutwherethedatacomesfrom%3F On Thu, Feb 2, 2012 at 5:11 PM, Ranjan Bagchi wrote: > Hi, > > I've a bunch of [for example] apache logfiles that I'm searching through.  I >

Re: Way of determining the source of data

2012-02-03 Thread Yulia Tolskaya
You can use MultiStorage from Piggybank, like so: https://cwiki.apache.org/confluence/display/PIG/FAQ#FAQ-Q%3AIloaddatafrom Just beware of this bug https://issues.apache.org/jira/browse/PIG-2462 If you are using pig-0.9.1 or pig 0.8 Yulia On 2/2/12 8:11 PM, "Ranjan Bagchi" wrote: >Hi, > >I'v

Way of determining the source of data

2012-02-02 Thread Ranjan Bagchi
Hi, I've a bunch of [for example] apache logfiles that I'm searching through. I can process them with: logs = load 's3://bucket/directory/*' USING LogLoader as (remoteAddr, remoteLogname, user, time :chararray, method, uri :chararray, proto, status, bytes, referer, userAgent); Is there any w