Rémy SAISSY created HIVE-7125: --------------------------------- Summary: Support strings in the DELIMITED BY statement Key: HIVE-7125 URL: https://issues.apache.org/jira/browse/HIVE-7125 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.13.0 Reporter: Rémy SAISSY
Hi, I came to work with a dataset which look like that: dataset.txt: salut|;les|;|amiches comment|;|allez|;|vous This dataset's delimiter is not a specific character like | or ; but a string, |;| in this case. Therefore I have created an external table with this delimiter: hive> create external table ds (f1 string, f2 string, f3 string) row format delimited fields terminated by '|;|' location '/user/remy/dataset'; But I got this error: MismatchedTokenException(5!=301) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormatFieldIdentifier(HiveParser.java:31433) at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatDelimited(HiveParser.java:30386) at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:30662) at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4683) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2144) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:373) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:291) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:944) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:102 mismatched input '|' expecting StringLiteral near 'by' in table row format's field separator The workaround was to run a mapreduce job to preprocess the data and replace the delimiter by a single and unused character (my client uses a three characters delimiter in order to ensure that the sequence won't appear elsewhere in the csv). However, it would be nice to be able to directly integrate it into an external table. -- This message was sent by Atlassian JIRA (v6.2#6252)