ok. thank you. but there's no parameter -e on page http://phoenix.apache.org/bulk_dataload.html and, why the -g,–ignore-errors parameter doesn't work? if there's some lines ended with backslash, just ignore it, why fail?
there's always something error in txt files. why not ignore it? how? and, if using -e parameter, what character should I use? seems that I must find a special character, but I don't know which is correct. actually, I don't want to use any escape character. is there any special option like "escape off" or something else, so I can load anything without treating any character as an escape letter. some other products , like greenplum, do have such interesting setting when bulkloading txt file: escape: 'OFF' ----------------------------------------------------------------- quote on http://phoenix.apache.org/bulk_dataload.html The following parameters can be used with the MapReduce loader. Parameter Description -i,–input Input CSV path (mandatory) -t,–table Phoenix table name (mandatory) -a,–array-delimiter Array element delimiter (optional) -c,–import-columns Comma-separated list of columns to be imported -d,–delimiter Input delimiter, defaults to comma -g,–ignore-errors Ignore input errors -o,–output Output path for temporary HFiles (optional) -s,–schema Phoenix schema name (optional) -z,–zookeeper Zookeeper quorum to connect to (optional) -it,–index-table Index table name to load (optional) -------------------------------- From: Gabriel Reid <[email protected]> Subject: Re: Error with lines ended with backslash when Bulk Data Loading Date: 2016-12-09 02:06 (+0800) List: [email protected] Hi Backslash is the default escape character that is used for parsing CSV data when running a bulk import, so it has a special meaning. You can supply a different (custom) escape character with the -e or --escape flag on the command line so that parsing your CSV files that include backslashes like this will run properly. - Gabriel ----- Original Message ----- From: "rubysina" <[email protected]> To: "user" <[email protected]> Subject: Error with lines ended with backslash when Bulk Data Loading Date: 2016-12-08 16:11 hi, I'm new to phoenix sql and here's a little problem. I'm following this page http://phoenix.apache.org/bulk_dataload.html I just found that the MapReduce importer could not load file with lines ended with backslash even with the -g parameter , i.e. ignore-errors, "java.io.IOException: EOF whilst processing escape sequence" but it's OK if the line contains backslash but not at the end of line, and there's no problem when using psql.py to load the same file. why? how? thank you. ----------------------------------------------------------------------------------------------- for example: create table a(a char(100) primary key) echo \\>a.csv cat a.csv \ hdfs dfs -put a.csv ...JsonBulkLoadTool -g -t a -i a.csv -- error 16/12/08 15:44:21 INFO mapreduce.Job: Task Id : attempt_1481093434027_0052_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: EOF whilst processing escape sequence at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:202) at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:74) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.RuntimeException: java.io.IOException: EOF whilst processing escape sequence at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398) at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407) at com.google.common.collect.Iterators.getNext(Iterators.java:890) at com.google.common.collect.Iterables.getFirst(Iterables.java:781) at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109) at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91) at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:161) ... 9 more echo \\a>a.csv cat a.csv \a hdfs dfs -rm a.csv hdfs dfs -put a.csv ...JsonBulkLoadTool -g -t a -i a.csv -- success echo \\>a.csv cat a.csv \ psql.py -t A zoo a.csv CSV Upsert complete. 1 rows upserted -- success thank you.
