Hello all, It has been a long time. I have been forced to use avro and create a table with over 5k columns. It's helluva slow. I warned folks that all the best practices say "dont make a table more than 1k or 2k columns" (impala hive cloudera). No one listened to me, so now the table is a mess. Impala works my show stats and refresh table take ages.
Spark sql might take an hour to go back and forth getting the meta data. hive-server, hive-thrift- oracle type setup. I have literally tied upping my spark heap to like 10GB. Does anyone have any tips for this insanity? Client or server side? Client would be easier because as you can guess if I cant stop folks from making a 5k column table I wont be able to get a server setting changed without selling my left leg. Also note this is using cloudera, so its probably not hive 3.x its whatever version they are backporting. Thanks, Edward