Hello, I want to write a functionality using UDTF. The functionality involves reading 7 different text files and create lookup structures such as Map, Set, List , Map of String and List etc to be used in the logic.
These files are small size average 15 MB. I can add these files in distributed cache and access them in UDTF, read the files, and create the necessary lookup data structures, but this would mean that the files will be opened, read and closed every time the UDTF is invoked. Is there a way that I can just read the files once, create the data structures needed , put them in distributed cache and access them from UDTF? I don't think creating hive tables from these files and doing a map side join is possible, as the functionality that I want to implement is fairly complex and I am not sure if it can be done just using hive query and join without using UDTF. Thanks in advance.