RE: Problem Bulk Loading CSV with Empty Value at End of Row

Cox, Jonathan A Wed, 30 Mar 2016 15:42:06 -0700

Actually, it seems that the line causing my problem really was missing a 
column. I checked the behavior of StringToArrayConverter in 
org.apache.phoenix.util.csv, and it does not exhibit such behavior.


So the fault is on my end.

Thanks

From: Cox, Jonathan A
Sent: Wednesday, March 30, 2016 3:36 PM
To: 'user@phoenix.apache.org'
Subject: Problem Bulk Loading CSV with Empty Value at End of Row

I am using the CsvBulkLoaderTool to ingest a tab separated file that can 
contain empty columns. The problem is that the loader incorrectly interprets an 
empty last column as a non-existent column (instead of as an null entry).

For example, imagine I have a comma separated CSV with the following format:
key,username,password,gender,position,age,school,favorite_color

Now, let's say my CSV file contains the following row, where the gender field 
is missing. This will load correctly:
*#Ssj289,joeblow,sk29ssh, ,CEO,102,MIT,blue<new line>

However, if the missing field happens to be the last entry (favorite_color), it 
complains that there are only 7 of 8 required columns present:
*#Ssj289,joeblow,sk29ssh,female ,CEO,102,MIT, <new line>

This behavior will throw an error and fail to load the entire CSV file. Any 
pointers on how I can modify the source to have Phoenix interpret 
<delimiter><newline> as an empty/null last column?

Thanks,
Jon
(actual error is pasted below)


java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalArgumentException: CSV record does not have enough values (has 
26, but needs 27)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV 
record does not have enough values (has 26, but needs 27)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:197)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:72)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: CSV record does not have enough 
values (has 26, but needs 27)
        at 
org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:74)
        at 
org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
        at 
org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:166)
        ... 10 more
16/03/30 15:01:01 INFO mapreduce.Job: Job job_local1507432235_0

RE: Problem Bulk Loading CSV with Empty Value at End of Row

Reply via email to