Re: VARCHAR or STRING fields in Hive

2017-01-16 Thread Gopal Vijayaraghavan
> Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL > Compliance. Otherwise they seem to be practically the same as String types. They are relatively identical in storage, except both are slower on the CPU in actual use (CHAR has additional padding code in the

Re: VARCHAR or STRING fields in Hive

2017-01-16 Thread Mich Talebzadeh
Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL Compliance. Otherwise they seem to be practically the same as String types. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: VARCHAR or STRING fields in Hive

2017-01-16 Thread Mich Talebzadeh
Thanks Elliot for the insight. Another issue that Spark does not support "CHAR" types. It supports VARCHAR. Often one uses Spark as well on these tables. This should not really matter. I tend to define CHA(N) to be VARCHAR(N) as the assumption is that the table ingested into Parquet say is

Re: VARCHAR or STRING fields in Hive

2017-01-16 Thread Elliot West
Internally it looks as though Hive simply represents CHAR/VARCHAR values using a Java String and so I would not expect a significant change in execution performance. The Hive JIRA suggests that these types were added to 'support for more SQL-compliant behavior, such as SQL string comparison

Re: VARCHAR or STRING fields in Hive

2017-01-16 Thread Mich Talebzadeh
thanks both. String has a max length of 2GB so in a MapReduce with a 128MB block size we are talking about 16 blocks. With VARCHAR(30) we are talking about 1 block. I have not really experimented with this, however, I assume a table of 100k rows with VARCHAR columns will have a smaller footprint

Re: VARCHAR or STRING fields in Hive

2017-01-16 Thread sreebalineni .
How is that efficient storage wise because as far as I see it is in hdfs and storage is based on your block size. Am i missing something here? On Jan 16, 2017 9:07 PM, "Mich Talebzadeh" wrote: Coming from DBMS background I tend to treat the columns in Hive similar

VARCHAR or STRING fields in Hive

2017-01-16 Thread Mich Talebzadeh
Coming from DBMS background I tend to treat the columns in Hive similar to an RDBMS table. For example if a table created in Hive as Parquet I will use VARCHAR(30) for column that has been defined as VARCHAR(30) as source. If a column is defined as TEXT in RDBMS table I use STRING in Hive with a