Hi Emil,
for either of the queries, there will be no mapreduce job. the query engine
understands that in both case, it need not do any computation and just
needs to fetch all the data from the files.

The fetch size should be honored in both cases. Hope you are using
hiveserver2.
You can try connections using excel and cloudera's odbc driver with the
required parameters for your testing. For each batch that hive returns, you
should be able to see in hive lg something like: returning results for id
<hash>

On Wed, Aug 19, 2015 at 2:54 PM, Emil Berglind <[email protected]> wrote:

> I have a small Java app that I wrote that uses JDBC to run a hive query.
> The Hive table that I'm running it against has 30+ million rows, and I want
> to pull them all back to verify the data. If I run a simple "SELECT * FROM
> <table>" and set a fetch size of 30,000 then the fetch size is not honored
> and it seems to want to bring back all 30+ million rows at once, which is
> definitely not going to work. If I set a LIMIT on the SQL, like "SELECT *
> FROM <table> LIMIT 9999999", then it honors the fetch size just fine.
> However, when I set the LIMIT on there, it does not run as a map reduce job
> but rather seems to stream the data back. Is this how it's supposed to
> work? I'm new to the Hadoop eco-system and I'm really just trying to figure
> out what the best way to bring this data back in chunks is. Maybe I'm going
> about this all wrong?
>

Reply via email to