I have a small Java app that I wrote that uses JDBC to run a hive query.
The Hive table that I'm running it against has 30+ million rows, and I want
to pull them all back to verify the data. If I run a simple "SELECT * FROM
<table>" and set a fetch size of 30,000 then the fetch size is not honored
and it seems to want to bring back all 30+ million rows at once, which is
definitely not going to work. If I set a LIMIT on the SQL, like "SELECT *
FROM <table> LIMIT 9999999", then it honors the fetch size just fine.
However, when I set the LIMIT on there, it does not run as a map reduce job
but rather seems to stream the data back. Is this how it's supposed to
work? I'm new to the Hadoop eco-system and I'm really just trying to figure
out what the best way to bring this data back in chunks is. Maybe I'm going
about this all wrong?

Reply via email to