Hui Huang created HIVE-18265:
--------------------------------

             Summary: desc formatted/extended or show create table can not 
fully display the result when field or table comment contains tab character
                 Key: HIVE-18265
                 URL: https://issues.apache.org/jira/browse/HIVE-18265
             Project: Hive
          Issue Type: Bug
          Components: CLI
    Affects Versions: 3.0.0
            Reporter: Hui Huang
            Assignee: Hui Huang
             Fix For: 3.0.0


Here are some examples:

create table test_comment (id1 string comment 'full_\tname1', id2 string 
comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;

When execute `show create table test_comment`, we can see the following content 
in the console,
{quote}
createtab_stmt
CREATE TABLE `test_comment`(
  `id1` string COMMENT 'full_
  `id2` string COMMENT 'full_
  `id3` string COMMENT 'full_
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
TBLPROPERTIES (
  'transient_lastDdlTime'='1513095570')
{quote}

And the output of `desc formatted table ` is a little similar,
{quote}
col_name        data_type       comment
\# col_name             data_type               comment

id1                     string                  full_
id2                     string                  full_
id3                     string                  full_

\# Detailed Table Information
(ignore)...
{quote}

When execute `desc extended test_comment`, the problem is more obvious,
{quote}
col_name        data_type       comment
id1                     string                  full_
id2                     string                  full_
id3                     string                  full_

Detailed Table Information      Table(tableName:test_comment, 
dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, 
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, 
comment:full_    name1), FieldSchema(name:id2, type:string, comment:full_
{quote}
*the rest of the content is lost*.

The content is not really lost, it's just can not display normal. Because hive 
store the result in LazyStruct, and LazyStruct use '\t' as field separator:

{code:java}
// LazyStruct.java#parse()
// Go through all bytes in the byte[]
    while (fieldByteEnd <= structByteEnd) {
      if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
        // Reached the end of a field?
        if (lastColumnTakesRest && fieldId == fields.length - 1) {
          fieldByteEnd = structByteEnd;
        }
        startPosition[fieldId] = fieldByteBegin;
        fieldId++;
        if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
          // All fields have been parsed, or bytes have been parsed.
          // We need to set the startPosition of fields.length to ensure we
          // can use the same formula to calculate the length of each field.
          // For missing fields, their starting positions will all be the same,
          // which will make their lengths to be -1 and uncheckedGetField will
          // return these fields as NULLs.
          for (int i = fieldId; i <= fields.length; i++) {
            startPosition[i] = fieldByteEnd + 1;
          }
          break;
        }
        fieldByteBegin = fieldByteEnd + 1;
        fieldByteEnd++;
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to