Hi,
I have complex log files (compressed ".gz", 200G) on HDFS.
+ log file format :
127.0.0.1 [2012Avg08] "a=abc&b=adf&c=aadfad"
I think DDL)),
CREATE TABLE log_tb (ip STRING, dt STRING, kv Map<STRING, STRING>)
ROW FORMAT SERDE "??"
STORED AS SEQUENCEFILE;
I want the results below.
SELECT kv['b']
FROM log_tb
LIMIT 10;
1) How do I parsing to Complex log file (compressed(".gz", 200G)
2) If I have to SerDe, what SerDe should I use?
3) Does existed SerDe(input/output) by user define class?
4) If I use to partition with log file, how use to DDL, DML?..plz. sample
sql (DDL, DML)
Thanks.