Hi Vishal,
Yes, it is a known issue that Drill error reporting needs some TLC. Obviously,
a better solution would be for the error to say something like
"NumerFormatException: Column foo, value "this is not a number"". Feel free to
file a JIRA ticket to remind us to fix this particular case. Please explain the
context so we have a good shot at reproducing the issue.
You said that the logs, at trace level, provided no information. Which version
of Drill are you using? If the latest (and, I think 1.16), there is a log
message each time the reader opens a file:
package org.apache.drill.exec.store.easy.text.reader;
public class CompliantTextBatchReader ...
private void openReader(TextOutput output) throws IOException {
logger.trace("Opening file {}", split.getPath());
Given this, you should see a series of "Opening file" messages when you enable
trace-level logging for the above class.
As Charles noted, CSV reads columns as text, let's assume that you do have a
CAST or other conversion. Then, the number format exception says that you are
trying to convert a column from text to a number, and that value does not
actually contain a number.
Again, it would be better if the error message told us the column that has the
problem. Otherwise, if the number of columns in question is small, you can run
a query to find non-numeric values. Now, it would be nice if Drill has an
isNumber() function. (Another Jira feature request you can file.)
Since I can't find one, we can roll our own with a regex. Something like:
SELECT foo FROM yourTable WHERE NOT regexp_matches('\d+')
If the number is a float or decimal, add the proper pattern.
Caveat: I didn't try the above regex, there may be some fiddly bits with
back-slashes.
Then, you can add file metadata (AKA "implicit") columns to give you the
information you want:
SELECT filename, foo FROM ...
If if that finds the data, and it is something you must handle, you can add an
IF function to handle the data.
Thanks,
- Paul
On Friday, February 14, 2020, 7:44:59 AM PST, Vishal Jadhav (BLOOMBERG/ 731
LEX) <[email protected]> wrote:
During my select statement on conversion of csv file to parquet file, I get
the NumberFormatException exception, I am running drill in the embedded mode.
Is there a way to find out which csv file or row in that file is causing the
issue?
I checked the logs with trace verbosity, but not able find the 'data' which has
the issue.
Error: SYSTEM ERROR: NumberFormatException
Fragment 1:5
Please, refer to logs for more information.
Thanks!
- Vishal