Hi Vishal,

Yes, it is a known issue that Drill error reporting needs some TLC. Obviously, 
a better solution would be for the error to say something like 
"NumerFormatException: Column foo, value "this is not a number"". Feel free to 
file a JIRA ticket to remind us to fix this particular case. Please explain the 
context so we have a good shot at reproducing the issue.


You said that the logs, at trace level, provided no information. Which version 
of Drill are you using? If the latest (and, I think 1.16), there is a log 
message each time the reader opens a file:

package org.apache.drill.exec.store.easy.text.reader;


public class CompliantTextBatchReader ...

  private void openReader(TextOutput output) throws IOException {
    logger.trace("Opening file {}", split.getPath());


Given this, you should see a series of "Opening file" messages when you enable 
trace-level logging for the above class.

As Charles noted, CSV reads columns as text, let's assume that you do have a 
CAST or other conversion. Then, the number format exception says that you are 
trying to convert a column from text to a number, and that value does not 
actually contain a number.

Again, it would be better if the error message told us the column that has the 
problem. Otherwise, if the number of columns in question is small, you can run 
a query to find non-numeric values. Now, it would be nice if Drill has an 
isNumber() function. (Another Jira feature request you can file.)

Since I can't find one, we can roll our own with a regex. Something like:

SELECT foo FROM yourTable WHERE  NOT regexp_matches('\d+')

If the number is a float or decimal, add the proper pattern.

Caveat: I didn't try the above regex, there may be some fiddly bits with 
back-slashes.

Then, you can add file metadata (AKA "implicit") columns to give you the 
information you want:

SELECT filename, foo FROM ...


If if that finds the data, and it is something you must handle, you can add an 
IF function to handle the data.

Thanks,
- Paul

 

    On Friday, February 14, 2020, 7:44:59 AM PST, Vishal Jadhav (BLOOMBERG/ 731 
LEX) <[email protected]> wrote:  
 
 During my select statement on conversion of csv file to parquet file, I get 
the NumberFormatException exception, I am running drill in the embedded mode. 
Is there a way to find out which csv file or row in that file is causing the 
issue?
I checked the logs with trace verbosity, but not able find the 'data' which has 
the issue. 

Error: SYSTEM ERROR: NumberFormatException

Fragment 1:5

Please, refer to logs for more information.

Thanks!
- Vishal

  

Reply via email to