Error with lines ended with backslash when Bulk Data Loading

rubysina Thu, 08 Dec 2016 00:11:26 -0800

hi, I'm new to phoenix sql and here's a little problem. 

I'm following this page http://phoenix.apache.org/bulk_dataload.html
I just found that the MapReduce importer could not load file with lines ended 
with backslash
even with the -g parameter , i.e. ignore-errors, "java.io.IOException: EOF 
whilst processing escape sequence"


but it's OK if the line contains backslash but not at the end of line, 

and there's no problem when using psql.py to load the same file.

why?  how?

thank you.



-----------------------------------------------------------------------------------------------
for example:


create table a(a char(100) primary key)

echo \\>a.csv
cat a.csv
\
hdfs dfs -put  a.csv  
...JsonBulkLoadTool  -g -t a  -i a.csv  
-- error
16/12/08 15:44:21 INFO mapreduce.Job: Task Id : 
attempt_1481093434027_0052_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: 
java.io.IOException: EOF whilst processing escape sequence
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:202)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:74)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.io.IOException: EOF whilst 
processing escape sequence
        at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
        at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
        at com.google.common.collect.Iterators.getNext(Iterators.java:890)
        at com.google.common.collect.Iterables.getFirst(Iterables.java:781)
        at 
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109)
        at 
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:161)
        ... 9 more



echo \\a>a.csv
cat a.csv
\a
hdfs dfs -rm  a.csv  
hdfs dfs -put  a.csv  
...JsonBulkLoadTool -g -t a  -i a.csv  
-- success


echo \\>a.csv
cat a.csv
\
psql.py -t A zoo a.csv 
CSV Upsert complete. 1 rows upserted
-- success


thank you.

Error with lines ended with backslash when Bulk Data Loading

Reply via email to