Hi Vishal, I think you can add the following to $DRILL_HOME/conf/logback.xml to enable the needed logging:
<logger name=" org.apache.drill.exec.store.easy.text.reader.CompliantTextBatchReader" additivity="false"> <level value="trace" /> <appender-ref ref="FILE" /> </logger> Note that if you use a config directory separate from your install (using the --site flag to launch Drill) then modify the file in your custom location. To file a JIRA ticket, just go to Drill's home page [1], Click on Community, then Community Resources, then the first entry under Developer Resources: JIRA which is [2]. Make sure the Drill project is selected. Then just fill in the type (Improvement), title, your version number and a description. There are many other fields, but we mostly don't use them. Would be super-helpful if you can include a few lines of a CSV file that exhibits the problem (once you track down the problem using logging.) Thanks, - Paul [1] http://drill.apache.org/ [2] https://issues.apache.org/jira/browse/DRILL/ On Tuesday, February 18, 2020, 5:21:26 AM PST, Vishal Jadhav (BLOOMBERG/ 731 LEX) <[email protected]> wrote: Hello Paul, Yes, I agree that a better error message would be a better solution. I am on drill 1.17. Regarding the logs - do I need to add/modify any specific things in the logback.xml to produce the trace? I can file a Jira with the instructions. What is the process for it? - Vishal From: [email protected] At: 02/14/20 17:47:26To: Vishal Jadhav (BLOOMBERG/ 731 LEX ) , [email protected] Subject: Re: data issue Hi Vishal, Yes, it is a known issue that Drill error reporting needs some TLC. Obviously, a better solution would be for the error to say something like "NumerFormatException: Column foo, value "this is not a number"". Feel free to file a JIRA ticket to remind us to fix this particular case. Please explain the context so we have a good shot at reproducing the issue. You said that the logs, at trace level, provided no information. Which version of Drill are you using? If the latest (and, I think 1.16), there is a log message each time the reader opens a file: package org.apache.drill.exec.store.easy.text.reader; public class CompliantTextBatchReader ... private void openReader(TextOutput output) throws IOException { logger.trace("Opening file {}", split.getPath()); Given this, you should see a series of "Opening file" messages when you enable trace-level logging for the above class. As Charles noted, CSV reads columns as text, let's assume that you do have a CAST or other conversion. Then, the number format exception says that you are trying to convert a column from text to a number, and that value does not actually contain a number. Again, it would be better if the error message told us the column that has the problem. Otherwise, if the number of columns in question is small, you can run a query to find non-numeric values. Now, it would be nice if Drill has an isNumber() function. (Another Jira feature request you can file.) Since I can't find one, we can roll our own with a regex. Something like: SELECT foo FROM yourTable WHERE NOT regexp_matches('\d+') If the number is a float or decimal, add the proper pattern. Caveat: I didn't try the above regex, there may be some fiddly bits with back-slashes. Then, you can add file metadata (AKA "implicit") columns to give you the information you want: SELECT filename, foo FROM ... If if that finds the data, and it is something you must handle, you can add an IF function to handle the data. Thanks, - Paul On Friday, February 14, 2020, 7:44:59 AM PST, Vishal Jadhav (BLOOMBERG/ 731 LEX) <[email protected]> wrote: During my select statement on conversion of csv file to parquet file, I get the NumberFormatException exception, I am running drill in the embedded mode. Is there a way to find out which csv file or row in that file is causing the issue? I checked the logs with trace verbosity, but not able find the 'data' which has the issue. Error: SYSTEM ERROR: NumberFormatException Fragment 1:5 Please, refer to logs for more information. Thanks! - Vishal
