> I've been experimenting with 'select *' and 'select * limit X' in >beeline and watching the hive-server2 log to understand when a M/R job is >triggered and when not. It seems like whenever I set a limit, the job is >avoided, but with no limit, it is run.
https://issues.apache.org/jira/browse/HIVE-10156 It¹s sitting on my back-burner (I know the fix, but I¹m working on the LLAP branch). > hive.limit.optimize.fetch.max > > That defaults to 50,000 and as I understand it, whenever I set limit to >above that number, a job should be triggered. But I can set limit to >something very high (e.g. 10M) and no job runs. That configs belong to a different optimization - the global limit case, which works as follows. Run query with a 50k row sample of the input, then if it doesn¹t produce enough rows, re-run the query with the full input data-set. You will notice errors on your JDBC connections with that optimization turned on (like HIVE-9382) and will get the following log line "Retry query with a different approach² in the HS2 logs. So I suggest not turning on the Global Limit optimization, if you¹re on JDBC/ODBC. Cheers, Gopal