Sometimes text / CSV data comes in with formatting errors, and Drill seems to have a difficulty with this by throwing a Java error instead of what I would describe as a DB engine error that describes the problem.

I logged https://issues.apache.org/jira/browse/DRILL-4845 for this, but wanted to also check that this is indeed the current expectation.

My concern is that with a "Big Data" engine, data comes in from a variety of sources, many of which don't perform proper validation. Doing CSV validation can of course be part of the process where we stage data into HDFS for Drill, but the current error detail we get when something is wrong makes locating the data issue and developing what is essentially and ETL transform for it.

Reply via email to