Thanks for the quick answer Gopal, and also for the details on that param.
I indeed use JDBC in production, so will stay away from it.

Just want to make sure I understand the behavior once that bug is fixed...a
'select *' with no limit will run without a M/R job and instead stream.  Is
that correct?

That may incidently solve another bug I'm seeing: when you use JDBC
templates to set the limit (setMaxRows in Spring in my setup), it does not
avoid the M/R job (and no limit clause appears in the hive-server2 log).
Instead, the M/R job gets launched...I'm not sure if the jdbc framework
subsequently would apply a limit, once the job finishes.  I haven't spotted
this issue in JIRA, I'd be happy to file it if that's useful to you.

Thanks!
Adam

On Tue, Jul 21, 2015 at 7:20 PM, Gopal Vijayaraghavan <gop...@apache.org>
wrote:

>
> > I've been experimenting with 'select *' and 'select * limit X' in
> >beeline and watching the hive-server2 log to understand when a M/R job is
> >triggered and when not.  It seems like whenever I set a limit, the job is
> >avoided, but with no limit, it is run.
>
> https://issues.apache.org/jira/browse/HIVE-10156
>
>
> It¹s sitting on my back-burner (I know the fix, but I¹m working on the
> LLAP branch).
>
> > hive.limit.optimize.fetch.max
> >
> > That defaults to 50,000 and as I understand it, whenever I set limit to
> >above that number, a job should be triggered.  But I can set limit to
> >something very high (e.g. 10M) and no job runs.
>
> That configs belong to a different optimization - the global limit case,
> which works as follows.
>
> Run query with a 50k row sample of the input, then if it doesn¹t produce
> enough rows, re-run the query with the full input data-set.
>
> You will notice errors on your JDBC connections with that optimization
> turned on (like HIVE-9382) and will get the following log line "Retry
> query with a different approachŠ² in the HS2 logs.
>
> So I suggest not turning on the Global Limit optimization, if you¹re on
> JDBC/ODBC.
>
> Cheers,
> Gopal
>
>
>
>

Reply via email to