Dear Hive mailing list,

in my setup, network throughput from the HiveServer2 to the client seems to be 
the bottleneck and I’m seeking a way do increase throughput. Let me elaborate 
my use case:

I’m using Hive version 1.1.0 that is bundeled with Clouders 5.5.1.

I want to fetch a huge amount of data from our Hive cluster. By huge I mean 
something around 100 million rows. The Hive table I’m querying is an external 
table whose data is stored in .avro. On HDFS, the data I want to fetch (i.e. 
the aforementioned 100 million rows) is about 5GB in size. A cleverer filtering 
strategy (to reduce the amount of data) is no option, sadly, since I need all 
the data.

I was able to reduce the time the MapReduce job takes to an agreeable interval 
fiddling around with `mapreduce.input.fileinputformat.split.maxsize`. The part 
that is taking ages comes after MapReduce. I’m observing that the Hadoop 
namenode that is hosting the HiveServer2 is merely sending data with around 3 
MB/sec. Our network is capable of much more. Playing around with `fetchSize` 
did not increase throughput. 

As I identified network throughput to be the bottleneck, I restricted my 
efforts to trying to increase it. For this, I simply run the query I’d normally 
run through JDBC (from Clojure/Java) via `beeline` and dumping the output to 
`/dev/null`. My `beeline` query looks something like that:

beeline \
    -u jdbc:hive2://srv:10000/db \
    -n user -p password \
    --outputformat=csv2 \
    --incremental=true \
    --hiveconf mapreduce.input.fileinputformat.split.maxsize=33554432 \
    -e 'SELECT <a lot of columns> FROM `db`.`table` WHERE (year=2016 AND 
month=6 AND day=1 AND hour=10)' > /dev/null

I already tried playing around with additional `—hiveconf`s:

    --hiveconf hive.exec.compress.output=true \
    --hiveconf mapred.output.compression.type=BLOCK \
    --hiveconf 
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec \

without success. 

In all cases, Hive is able only to utilize a tiny fraction of the bandwidth 
that is available. Is there a possibility to increase network throughput?

Thank you in advance!

Yours

David Nies
Entwickler Business Intelligence
 
ADITION technologies AG
 
Oststraße 55, D-40211 Düsseldorf
Schwarzwaldstraße 78b, D-79117 Freiburg im Breisgau
 
T +49 211 987400 30
F +49 211 987400 33
E david.n...@adition.com <mailto:david.n...@adition.com>
 
Technischen Support erhalten Sie unter der +49 1805 2348466
(Festnetzpreis: 14 ct/min; Mobilfunkpreise: maximal 42 ct/min)
 
Abonnieren Sie uns auf XING oder besuchen Sie uns unter www.adition.com 
<http://www.adition.com/>.
 
Vorstände: Andreas Kleiser, Jörg Klekamp, Dr. Lutz Lowis, Marcus Schlüter 
Aufsichtsratsvorsitzender: Joachim Schneidmadl
Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076 
UStIDNr.: DE 218 858 434

Reply via email to