> is hosting the HiveServer2 is merely sending data with around 3 MB/sec. >Our network is capable of much more. Playing around with `fetchSize` did >not increase throughput. ... > --hiveconf >mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec >\
The current implementation you have is CPU bound in HiveServer2, the compression generally makes it worse. The fetch size does help, but it only prevents the system from doing synchronized operations frequently (pausing every 50 rows is too often, the default is now 10000 rows). > -e 'SELECT <a lot of columns> FROM `db`.`table` WHERE (year=2016 AND >month=6 AND day=1 AND hour=10)' > /dev/null Quick q - are year/month/day/hour partition columns? If so, there might be a very different fix to this problem. > In all cases, Hive is able only to utilize a tiny fraction of the >bandwidth that is available. Is there a possibility to increase network >throughput? A series of work-items are in progress for fixing the large row-set performance in HiveServer2 https://issues.apache.org/jira/browse/HIVE-11527 https://issues.apache.org/jira/browse/HIVE-12427 What would be great would be to attach a profiler to your HiveServer2 & see which functions are hot, that will help fix those codepaths as part of the joint effort with the ODBC driver teams. Cheers, Gopal