> Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL
> Compliance. Otherwise they seem to be practically the same as String types.
They are relatively identical in storage, except both are slower on the CPU in
actual use (CHAR has additional padding code in the
Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL
Compliance. Otherwise they seem to be practically the same as String types.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Thanks Elliot for the insight.
Another issue that Spark does not support "CHAR" types. It supports
VARCHAR. Often one uses Spark as well on these tables.
This should not really matter. I tend to define CHA(N) to be VARCHAR(N) as
the assumption is that the table ingested into Parquet say is
Internally it looks as though Hive simply represents CHAR/VARCHAR values
using a Java String and so I would not expect a significant change in
execution performance. The Hive JIRA suggests that these types were added
to 'support for more SQL-compliant behavior, such as SQL string comparison
thanks both.
String has a max length of 2GB so in a MapReduce with a 128MB block size we
are talking about 16 blocks. With VARCHAR(30) we are talking about 1 block.
I have not really experimented with this, however, I assume a table of 100k
rows with VARCHAR columns will have a smaller footprint
How is that efficient storage wise because as far as I see it is in hdfs
and storage is based on your block size.
Am i missing something here?
On Jan 16, 2017 9:07 PM, "Mich Talebzadeh"
wrote:
Coming from DBMS background I tend to treat the columns in Hive similar
Coming from DBMS background I tend to treat the columns in Hive similar to
an RDBMS table. For example if a table created in Hive as Parquet I will
use VARCHAR(30) for column that has been defined as VARCHAR(30) as source.
If a column is defined as TEXT in RDBMS table I use STRING in Hive with a