Hi Vishal,

I think you can add the following to $DRILL_HOME/conf/logback.xml to enable the 
needed logging:

  <logger name=" 
org.apache.drill.exec.store.easy.text.reader.CompliantTextBatchReader" 
additivity="false">
    <level value="trace" />
    <appender-ref ref="FILE" />
  </logger>


Note that if you use a config directory separate from your install (using the 
--site flag to launch Drill) then modify the file in your custom location.

To file a JIRA ticket, just go to Drill's home page [1], Click on Community, 
then Community Resources, then the first entry under Developer Resources: JIRA 
which is [2].

Make sure the Drill project is selected. Then just fill in the type 
(Improvement), title, your version number and a description. There are many 
other fields, but we mostly don't use them.

Would be super-helpful if you can include a few lines of a CSV file that 
exhibits the problem (once you track down the problem using logging.)


Thanks,
- Paul


[1] http://drill.apache.org/
 
[2] https://issues.apache.org/jira/browse/DRILL/

    On Tuesday, February 18, 2020, 5:21:26 AM PST, Vishal Jadhav (BLOOMBERG/ 
731 LEX) <[email protected]> wrote:  
 
 Hello Paul,
Yes, I agree that a better error message would be a better solution. I am on 
drill 1.17. Regarding the logs - do I need to add/modify any specific things in 
the logback.xml to produce the trace?
I can file a Jira with the instructions. What is the process for it?
- Vishal

From: [email protected] At: 02/14/20 17:47:26To:  Vishal Jadhav (BLOOMBERG/ 
731 LEX ) ,  [email protected]
Subject: Re: data issue

Hi Vishal,

Yes, it is a known issue that Drill error reporting needs some TLC. Obviously, 
a better solution would be for the error to say something like 
"NumerFormatException: Column foo, value "this is not a number"". Feel 
free to file a JIRA ticket to remind us to fix this particular case. Please 
explain the context so we have a good shot at reproducing the issue.


You said that the logs, at trace level, provided no information. Which version 
of Drill are you using? If the latest (and, I think 1.16), there is a log 
message each time the reader opens a file:

package org.apache.drill.exec.store.easy.text.reader;


public class CompliantTextBatchReader ...

  private void openReader(TextOutput output) throws IOException {
    logger.trace("Opening file {}", split.getPath());


Given this, you should see a series of "Opening file" messages when you enable 
trace-level logging for the above class.

As Charles noted, CSV reads columns as text, let's assume that you do have a 
CAST or other conversion. Then, the number format exception says that you are 
trying to convert a column from text to a number, and that value does not 
actually contain a number.

Again, it would be better if the error message told us the column that has the 
problem. Otherwise, if the number of columns in question is small, you can run 
a query to find non-numeric values. Now, it would be nice if Drill has an 
isNumber() function. (Another Jira feature request you can file.)

Since I can't find one, we can roll our own with a regex. Something like:

SELECT foo FROM yourTable WHERE  NOT regexp_matches('\d+')

If the number is a float or decimal, add the proper pattern.

Caveat: I didn't try the above regex, there may be some fiddly bits with 
back-slashes.

Then, you can add file metadata (AKA "implicit") columns to give you the 
information you want:

SELECT filename, foo FROM ...


If if that finds the data, and it is something you must handle, you can add an 
IF function to handle the data.

Thanks,
- Paul

 

    On Friday, February 14, 2020, 7:44:59 AM PST, Vishal Jadhav (BLOOMBERG/ 731 
LEX) <[email protected]> wrote:  
 
 During my select statement on conversion of csv file to parquet file, I get 
the NumberFormatException exception, I am running drill in the embedded mode. 
Is there a way to find out which csv file or row in that file is causing the 
issue?
I checked the logs with trace verbosity, but not able find the 'data' which has 
the issue. 

Error: SYSTEM ERROR: NumberFormatException

Fragment 1:5

Please, refer to logs for more information.

Thanks!
- Vishal

  

  

Reply via email to