Loopup objects in distributed cache

vivek thakre Wed, 03 Apr 2013 13:36:10 -0700

Hello,

I want to write a functionality using UDTF. The functionality involves
reading 7 different text files and create lookup structures such as Map,
Set, List , Map of String and List etc to be used in the logic.


These files are small size average 15 MB.

I can add these files in distributed cache and access them in UDTF, read
the files, and create the necessary lookup data structures, but this would
mean that the files will be opened, read and closed every time the UDTF is
invoked.

Is there a way that I can just read the files once, create the data
structures needed , put them in distributed cache and access them from UDTF?

I don't think creating hive tables from these files and doing a map side
join is possible, as the functionality that I want to implement is fairly
complex and I am not sure if it can be done just using hive query and join
without using UDTF.

Thanks in advance.

Loopup objects in distributed cache

Reply via email to