hmmm. so beeline blew up *before* the query was even submitted to the execution engine? one would think 16G would be plenty 8M row sql statement.
some suggestions if you feel like going further down the rabbit hole. 1. confirm your beeline java process is indeed running with expanded memory (ps -ef | grep beeline and look for the last Xmx setting on the line) 2. try the hive-cli (or the python one even.) or "beeline -u jdbc:hive2://" (local beeline - maybe thats different) 3. chop down your 6K points to 3K or something smaller to see just where the breaking point is. does 1K points even work? ie. determine how close to edge are you? Cheers, Stephen. PS. i had never heard of a "theta" join before so a searched it and found this: https://cwiki.apache.org/confluence/display/Hive/Theta+Join and this: https://issues.apache.org/jira/browse/HIVE-556 (looks like this came first) and still in "open" status i see. well you're not alone if that's any solace! maybe ping that Jira and see if Edward or Brock (or others) have any new news on the topic as supporting theta joins sounds like the proper solution to this whole rigamarole you find yourself in. On Fri, Sep 2, 2016 at 6:12 AM, Adam <work....@gmail.com> wrote: > I set the heap size using HADOOP_CLIENT_OPTS all the way to 16g and still > no luck. > > I tried to go down the table join route but the problem is that the > relation is not an equality so it would be a theta join which is not > supported in Hive. > Basically what I am doing is a geographic intersection against 6,000 > points so the where clause has 6000 points in it (I use a custom UDF for > the intersection). > > To avoid the problem I ended up writing another version of the UDF that > reads the point list from an HDFS file. > > It's a low priority I'm sure but I bet there are some inefficiencies in > the query string handling that could be fixed. When I traced the code it > was doing all kinds of StringBuffer and String += type stuff. > > Regards, >