Thanks Jan for your reply. This is helpful Vivek
On Thu, Apr 4, 2013 at 12:11 AM, Jan DolinĂ¡r <dolik....@gmail.com> wrote: > Hello Vivek, > > GenericUDTF has method initialize() which is only called once per task. So > if you read your files in this method and store the structures in memory > then the overhead is relatively small (reading 15MB per mapper is > negligible compared to several GB of processed data). > > Best regards, > Jan > > > On Wed, Apr 3, 2013 at 10:35 PM, vivek thakre <vivek.tha...@gmail.com>wrote: > >> Hello, >> >> I want to write a functionality using UDTF. The functionality involves >> reading 7 different text files and create lookup structures such as Map, >> Set, List , Map of String and List etc to be used in the logic. >> >> These files are small size average 15 MB. >> >> I can add these files in distributed cache and access them in UDTF, read >> the files, and create the necessary lookup data structures, but this would >> mean that the files will be opened, read and closed every time the UDTF is >> invoked. >> >> Is there a way that I can just read the files once, create the data >> structures needed , put them in distributed cache and access them from UDTF? >> >> I don't think creating hive tables from these files and doing a map side >> join is possible, as the functionality that I want to implement is fairly >> complex and I am not sure if it can be done just using hive query and join >> without using UDTF. >> >> Thanks in advance. >> > >