Thanks for your suggestion. I wouldn't want to run a map reduce job just to just get the file in a single tuple. But also, I can't be sure I get the lines sorted within the group, in the same order they are in the file.
Thanks On 10 March 2015 at 06:39, Arvind S <arvind18...@gmail.com> wrote: > while loading file you can attempt to use > PigStorage(',','-tagFile') > then regex on each line of the file .. then group by file name > > > https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/builtin/PigStorage.html > > *Cheers !!* > Arvind > > On Fri, Mar 6, 2015 at 2:26 AM, Daniel Dai <da...@hortonworks.com> wrote: > > > DidnĀ¹t realize any, but it should be pretty easy to write a customized > > Loader/InputFormat for that. > > > > Daniel > > > > On 3/5/15, 6:18 AM, "Ronald Green" <green.ron...@gmail.com> wrote: > > > > >Hi, > > > > > >I'm looking for a loader function that will let me read each file as a > > >record on its own so I'll be able to treat each as a single > record/field. > > >For example: > > > > > >a = load '/files' USING TheLoader() as (file:chararray); > > >b = foreach a GENERATE REGEX_EXTRACT(file,'...'); > > > > > >PigStorage and TextLoader return each line in the file as a > record/tuple. > > > > > >Do you know any other loader that allows to get an entire file as a > > >record? > > > > > >Thanks, > > >Ron > > > > >