Alan Jackoway created IMPALA-7056:
-------------------------------------

             Summary: Changing Text Delimiter Does Not Work
                 Key: IMPALA-7056
                 URL: https://issues.apache.org/jira/browse/IMPALA-7056
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog, Docs
    Affects Versions: Impala 2.12.0
            Reporter: Alan Jackoway


The wording on 
https://impala.apache.org/docs/build/html/topics/impala_alter_table.html makes 
it seem like you can change the delimiter of text tables after they are created.

I did the following to simulate a table that needed to switch between comma and 
pipe delimited:
{code}
hadoop fs -mkdir /user/alanj
hadoop fs -mkdir /user/alanj/test_delim
echo "A,B|C" > delim.txt
hadoop fs -put delim.txt /user/alanj/test_delim
{code}

Then created in impala and tried to change delimiters:
{code:sql}
> create external table default.alanj_test_delim(A string, B string) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY "," LOCATION '/user/alanj/test_delim';
> select * from default.alanj_test_delim;
Query: select * from default.alanj_test_delim
+---+-----+
| a | b   |
+---+-----+
| A | B|C |
+---+-----+
> alter table default.alanj_test_delim set SERDEPROPERTIES 
> ('serialization.format'='|', 'field.delim'='|');
> select * from default.alanj_test_delim;
+---+-----+
| a | b   |
+---+-----+
| A | B|C |
+---+-----+
> show create table default.alanj_test_delim;
+----------------------------------------------------------------------------------------------------------------------+
| result                                                                        
                                       |
+----------------------------------------------------------------------------------------------------------------------+
| CREATE EXTERNAL TABLE default.alanj_test_delim (                              
                                       |
|   a STRING,                                                                   
                                       |
|   b STRING                                                                    
                                       |
| )                                                                             
                                       |
| ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'                                 
                                       |
| WITH SERDEPROPERTIES ('field.delim'='|', 'serialization.format'='|')          
                                       |
| STORED AS TEXTFILE                                                            
                                       |
| LOCATION 'hdfs://namenode:8020/user/alanj/test_delim'                         
         |
| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 
'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0') |
+----------------------------------------------------------------------------------------------------------------------+
{code}

So it shows the right serdeproperties, but impala doesn't actually use them to 
read the data.

If you then insert data (as the docs suggest), it writes that data with the new 
delimiter:
{code:sql}
> insert into default.alanj_test_delim values('D', 'E,F');
> select * from alanj_test_delim;
+-----+-----+
| a   | b   |
+-----+-----+
| A,B | C   |
| D   | E,F |
+-----+-----+
# hadoop fs -cat 
/user/alanj/test_delim/a54bb0ec14646492-a738811400000000_1498283208_data.0.
D|E,F
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to