Re: VARCHAR or STRING fields in Hive

Gopal Vijayaraghavan Mon, 16 Jan 2017 18:38:47 -0800

> Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL 
> Compliance. Otherwise they seem to be practically the same as String types.


They are relatively identical in storage, except both are slower on the CPU in 
actual use (CHAR has additional padding code in the hot-path).

There is no constant form for those two types, so all string operations like 
say = 'NONE' would get promoted up as 

UDFToString(varcharcol) = 'NONE'

Resulting in all ORC/Parquet index pushdowns being turned off due to the cast 
on the column & if you run an explain and notice something similar, it will 
cause a significant performance loss.

In general, I see 2-3x performance degradation in case of CHAR/VARCHAR when 
doing constant filter operations & other issues when joining different sized 
ops (Varchar(3) x Varchar(4) would go this route).

The default String types are faster purely because they are the destination 
type for any up-conversion or constant-folding conversions.

Cheers,
Gopal

Re: VARCHAR or STRING fields in Hive

Reply via email to