Re: Error with lines ended with backslash when Bulk Data Loading

rubysina Thu, 08 Dec 2016 17:29:46 -0800

ok.  thank you.

but there's no parameter -e on page http://phoenix.apache.org/bulk_dataload.html
and, why the -g,–ignore-errors parameter doesn't work?  if there's some lines 
ended with backslash, just ignore it, why fail?


there's always something error in txt files. why not ignore it? how?

and, if using -e parameter, what character should I use? 
seems that I must find a special character, but I don't know which is correct.
actually, I don't want to use any escape character. 
is there any special option like "escape off" or something else, so I can load 
anything without treating any character as an escape letter.

some other products , like greenplum, do have such interesting setting when 
bulkloading txt file: escape: 'OFF'

-----------------------------------------------------------------
quote on http://phoenix.apache.org/bulk_dataload.html
The following parameters can be used with the MapReduce loader.
Parameter     Description
-i,–input     Input CSV path (mandatory)
-t,–table     Phoenix table name (mandatory)
-a,–array-delimiter     Array element delimiter (optional)
-c,–import-columns     Comma-separated list of columns to be imported
-d,–delimiter     Input delimiter, defaults to comma
-g,–ignore-errors     Ignore input errors
-o,–output     Output path for temporary HFiles (optional)
-s,–schema     Phoenix schema name (optional)
-z,–zookeeper     Zookeeper quorum to connect to (optional)
-it,–index-table     Index table name to load (optional)


--------------------------------

From: Gabriel Reid <[email protected]>
Subject: Re: Error with lines ended with backslash when Bulk Data Loading
Date: 2016-12-09 02:06 (+0800)
List: [email protected]
Hi

Backslash is the default escape character that is used for parsing CSV
data when running a bulk import, so it has a special meaning.

You can supply a different (custom) escape character with the -e or
--escape flag on the command line so that parsing your CSV files that
include backslashes like this will run properly.

- Gabriel

----- Original Message -----
From: "rubysina" <[email protected]>
To: "user" <[email protected]>
Subject: Error with lines ended with backslash when Bulk Data Loading
Date: 2016-12-08 16:11

hi, I'm new to phoenix sql and here's a little problem. 

I'm following this page http://phoenix.apache.org/bulk_dataload.html
I just found that the MapReduce importer could not load file with lines ended 
with backslash
even with the -g parameter , i.e. ignore-errors, "java.io.IOException: EOF 
whilst processing escape sequence"

but it's OK if the line contains backslash but not at the end of line, 

and there's no problem when using psql.py to load the same file.

why?  how?

thank you.



-----------------------------------------------------------------------------------------------
for example:


create table a(a char(100) primary key)

echo \\>a.csv
cat a.csv
\
hdfs dfs -put  a.csv  
...JsonBulkLoadTool  -g -t a  -i a.csv  
-- error
16/12/08 15:44:21 INFO mapreduce.Job: Task Id : 
attempt_1481093434027_0052_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: 
java.io.IOException: EOF whilst processing escape sequence
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:202)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:74)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.io.IOException: EOF whilst 
processing escape sequence
        at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
        at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
        at com.google.common.collect.Iterators.getNext(Iterators.java:890)
        at com.google.common.collect.Iterables.getFirst(Iterables.java:781)
        at 
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109)
        at 
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:161)
        ... 9 more



echo \\a>a.csv
cat a.csv
\a
hdfs dfs -rm  a.csv  
hdfs dfs -put  a.csv  
...JsonBulkLoadTool -g -t a  -i a.csv  
-- success


echo \\>a.csv
cat a.csv
\
psql.py -t A zoo a.csv 
CSV Upsert complete. 1 rows upserted
-- success


thank you.

Re: Error with lines ended with backslash when Bulk Data Loading

Reply via email to