Issue with Inserting/Selecting Data From a ROW FORMAT SERDE table

Aniket Daoo Tue, 18 Sep 2012 08:15:50 -0700

Hi,

I have a ROW FORMAT SERDE table created using the following DDL.


CREATE external TABLE multivalset_u6
(
col1 string,
col2 string,
col3 string,
col4 string,
col5 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "(.*)\\t(.*)\\t~([0-9]{6})~(.*)~(.*)",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
STORED AS TEXTFILE
LOCATION '/user/admin/u6/parsed/';

The above table LOCATION contains the file to be read by this table. I need the 
original file to be parsed and stored as a tab delimited file with 5 columns.

Sample row from original file:
                02-15-2012-11:34:56  873801356593332362   ~3261961~1~10.0

Sample row from the expected parsed file:
       02-15-2012-11:34:56  873801356593332362   3261961       1      10.0

To do this, I was trying to create a table with 5 columns at another location 
and insert data from the table multivalset_u6 into it. I encountered the 
following message on the console while doing so.

Ended Job = job_201209171421_0029 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201209171421_0029_m_000002 (and more) from job 
job_201209171421_0029
Exception in thread "Thread-47" java.lang.RuntimeException: Error while reading 
from task log url
        at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
        at 
org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
        at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for 
URL: 
http://10.40.35.54:9103/tasklog?taskid=attempt_201209171421_0029_m_000000_0&start=-8193
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
        at java.net.URL.openStream(URL.java:1010)
        at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
        ... 3 more
Counters:
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

I have observed that when I execute a SELECT * FROM multivalset_u6, I get the 
output with all the columns as expected. However, on executing a SELECT on 
individual columns like SELECT col1, col2, col3, col4, col5 FROM 
multivalset_u6, a similar error message appears.

Am I missing something here? Is there a way to work around this?

Thanks,
Aniket

Issue with Inserting/Selecting Data From a ROW FORMAT SERDE table

Reply via email to