I set the heap size using HADOOP_CLIENT_OPTS all the way to 16g and still
no luck.

I tried to go down the table join route but the problem is that the
relation is not an equality so it would be a theta join which is not
supported in Hive.
Basically what I am doing is a geographic intersection against 6,000 points
so the where clause has 6000 points in it (I use a custom UDF for the
intersection).

To avoid the problem I ended up writing another version of the UDF that
reads the point list from an HDFS file.

It's a low priority I'm sure but I bet there are some inefficiencies in the
query string handling that could be fixed.  When I traced the code it was
doing all kinds of StringBuffer and String += type stuff.

Regards,

Reply via email to