> I kept hearing about vectorization, but later found out it was going to work > if i used ORC.
Yes, it's a tautology - if you cared about performance, you'd use ORC, because ORC is the fastest format. And doing performance work to support folks who don't quite care about it, is not exactly "see a need, fill a need". > Litterally years have come and gone and we are talking like 3.x is going to > vectorize text. Literally years have gone by since the feature came into Hive. Though it might have crept up on you - if Vectorization had been enabled by default, it would've been immediately obvious. HIVE-9937 is so old, that I'd say the first line towards Text vectorization came in in Q1 2015. In the current master, you can get a huge boost out of it - if you want you can run BI over 100Tb of text. https://www.slideshare.net/Hadoop_Summit/llap-building-cloudfirst-bi/27 > … where some not negligible part of the features ONLY work with ORC. You've got it backwards - ORC was designed to support those features. Parquet could be following ORC closely, but at least the Java implementation hasn't. Cheers, Gopal