I set the heap size using HADOOP_CLIENT_OPTS all the way to 16g and still no luck.
I tried to go down the table join route but the problem is that the relation is not an equality so it would be a theta join which is not supported in Hive. Basically what I am doing is a geographic intersection against 6,000 points so the where clause has 6000 points in it (I use a custom UDF for the intersection). To avoid the problem I ended up writing another version of the UDF that reads the point list from an HDFS file. It's a low priority I'm sure but I bet there are some inefficiencies in the query string handling that could be fixed. When I traced the code it was doing all kinds of StringBuffer and String += type stuff. Regards,