Avro tables with 5k columns any tips?

Edward Capriolo Wed, 24 Feb 2021 20:19:44 -0800

Hello all,

It has been a long time. I have been forced to use avro and create a table
with over 5k columns. It's helluva slow. I warned folks that all the best
practices say "dont make a table more than 1k or 2k columns" (impala hive
cloudera). No one listened to me, so now the table is a mess. Impala works
my show stats and refresh table take ages.


Spark sql might take an hour to go back and forth getting the meta data.
hive-server, hive-thrift- oracle type setup.

I have literally tied upping my spark heap to like 10GB. Does anyone have
any tips for this insanity? Client or server side? Client would be easier
because as you can guess if I cant stop folks from making a 5k column table
I wont be able to get a server setting changed without selling my left leg.

Also note this is using cloudera, so its probably not hive 3.x its whatever
version they are backporting.

Thanks,
Edward

Avro tables with 5k columns any tips?

Reply via email to