> is hosting the HiveServer2 is merely sending data with around 3 MB/sec.
>Our network is capable of much more. Playing around with `fetchSize` did
>not increase throughput.
...
> --hiveconf 
>mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
>\

The current implementation you have is CPU bound in HiveServer2, the
compression generally makes it worse.

The fetch size does help, but it only prevents the system from doing
synchronized operations frequently (pausing every 50 rows is too often,
the default is now 10000 rows).

>    -e 'SELECT <a lot of columns> FROM `db`.`table` WHERE (year=2016 AND
>month=6 AND day=1 AND hour=10)' > /dev/null

Quick q - are year/month/day/hour partition columns? If so, there might be
a very different fix to this problem.

> In all cases, Hive is able only to utilize a tiny fraction of the
>bandwidth that is available. Is there a possibility to increase network
>throughput?

A series of work-items are in progress for fixing the large row-set
performance in HiveServer2

https://issues.apache.org/jira/browse/HIVE-11527

https://issues.apache.org/jira/browse/HIVE-12427

What would be great would be to attach a profiler to your HiveServer2 &
see which functions are hot, that will help fix those codepaths as part of
the joint effort with the ODBC driver teams.

Cheers,
Gopal


Reply via email to