If you like to treat "" as NULL, please set this option "drill.exec.functions.cast_empty_string_to_null" as true in the system level.
Meaning, alter system set `drill.exec.functions.cast_empty_string_to_null` = true On Sun, Mar 13, 2016 at 3:53 PM, Matt <bsg...@gmail.com> wrote: > That does indeed remove the NumberFormatException, replacing it with the > same error on "empty String". > > As indicated by Jason in this thread I may have mistakenly believed that > empty columns would be treated as NULLs. > > Is there a setting to set the NULL value in the data, or should all of > those COALESCE be replaced with CASE statements? It would be a lot of CASEs > in this case (39 of them). > > > On 11 Mar 2016, at 0:31, Abdel Hakim Deneche wrote: > > Looks like the COALESCE function is the source of the problem. Passing a >> double (0.0) instead of an int (0) as a second expression solved the >> problem for me: >> >> CAST(COALESCE(t_total, 0.0) AS double) >> >> >> On Fri, Mar 11, 2016 at 12:45 AM, Matt <bsg...@gmail.com> wrote: >> >> ~~~ >>> 00-01 Project(date_tm=[CAST($23):TIMESTAMP(0)], >>> id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE >>> "ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET >>> "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], >>> id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE >>> "ISO-8859-1$en_US$primary"], b_total=[CAST(CASE(IS NOT NULL($4), $4, >>> 0)):BIGINT], t_total=[CAST(CASE(IS NOT NULL($31), $31, 0)):DOUBLE], >>> h_total=[CAST(CASE(IS NOT NULL($40), $40, 0)):BIGINT], >>> b_small=[CAST(CASE(IS NOT NULL($36), $36, 0)):BIGINT], >>> t_small=[CAST(CASE(IS NOT NULL($14), $14, 0)):DOUBLE], >>> h_small=[CAST(CASE(IS NOT NULL($38), $38, 0)):BIGINT], >>> b_18000=[CAST(CASE(IS NOT NULL($32), $32, 0)):BIGINT], >>> t_18000=[CAST(CASE(IS NOT NULL($24), $24, 0)):DOUBLE], >>> h_18000=[CAST(CASE(IS NOT NULL($27), $27, 0)):BIGINT], >>> b_12000=[CAST(CASE(IS NOT NULL($30), $30, 0)):BIGINT], >>> t_12000=[CAST(CASE(IS NOT NULL($28), $28, 0)):DOUBLE], >>> h_12000=[CAST(CASE(IS NOT NULL($20), $20, 0)):BIGINT], >>> b_6000=[CAST(CASE(IS >>> NOT NULL($41), $41, 0)):BIGINT], t_6000=[CAST(CASE(IS NOT NULL($37), $37, >>> 0)):DOUBLE], h_6000=[CAST(CASE(IS NOT NULL($29), $29, 0)):BIGINT], >>> b_3000=[CAST(CASE(IS NOT NULL($17), $17, 0)):BIGINT], >>> t_3000=[CAST(CASE(IS >>> NOT NULL($7), $7, 0)):DOUBLE], h_3000=[CAST(CASE(IS NOT NULL($1), $1, >>> 0)):BIGINT], b_2000=[CAST(CASE(IS NOT NULL($26), $26, 0)):BIGINT], >>> t_2000=[CAST(CASE(IS NOT NULL($34), $34, 0)):DOUBLE], >>> h_2000=[CAST(CASE(IS >>> NOT NULL($10), $10, 0)):BIGINT], b_1500=[CAST(CASE(IS NOT NULL($42), $42, >>> 0)):BIGINT], t_1500=[CAST(CASE(IS NOT NULL($13), $13, 0)):DOUBLE], >>> h_1500=[CAST(CASE(IS NOT NULL($3), $3, 0)):BIGINT], b_1250=[CAST(CASE(IS >>> NOT NULL($21), $21, 0)):BIGINT], t_1250=[CAST(CASE(IS NOT NULL($25), $25, >>> 0)):DOUBLE], h_1250=[CAST(CASE(IS NOT NULL($16), $16, 0)):BIGINT], >>> b_1000=[CAST(CASE(IS NOT NULL($12), $12, 0)):BIGINT], >>> t_1000=[CAST(CASE(IS >>> NOT NULL($19), $19, 0)):DOUBLE], h_1000=[CAST(CASE(IS NOT NULL($6), $6, >>> 0)):BIGINT], b_750=[CAST(CASE(IS NOT NULL($9), $9, 0)):BIGINT], >>> t_750=[CAST(CASE(IS NOT NULL($0), $0, 0)):DOUBLE], h_750=[CAST(CASE(IS >>> NOT >>> NULL($5), $5, 0)):BIGINT], b_500=[CAST(CASE(IS NOT NULL($2), $2, >>> 0)):BIGINT], t_500=[CAST(CASE(IS NOT NULL($8), $8, 0)):DOUBLE], >>> h_500=[CAST(CASE(IS NOT NULL($39), $39, 0)):BIGINT], b_0=[CAST(CASE(IS >>> NOT >>> NULL($18), $18, 0)):BIGINT], t_0=[CAST(CASE(IS NOT NULL($35), $35, >>> 0)):DOUBLE], EXPR$42=[CAST(CASE(IS NOT NULL($22), $22, 0)):BIGINT]) >>> 00-02 Scan(groupscan=[EasyGroupScan >>> [selectionRoot=hdfs://es05:54310/caspr/csv/smalltest.csv, numFiles=1, >>> columns=[`date_tm`, `id_1`, `id_2`, `id_3`, `b_total`, `t_total`, >>> `h_total`, `b_small`, `t_small`, `h_small`, `b_18000`, `t_18000`, >>> `h_18000`, `b_12000`, `t_12000`, `h_12000`, `b_6000`, `t_6000`, `h_6000`, >>> `b_3000`, `t_3000`, `h_3000`, `b_2000`, `t_2000`, `h_2000`, `b_1500`, >>> `t_1500`, `h_1500`, `b_1250`, `t_1250`, `h_1250`, `b_1000`, `t_1000`, >>> `h_1000`, `b_750`, `t_750`, `h_750`, `b_500`, `t_500`, `h_500`, `b_0`, >>> `t_0`, `h_0`], files=[hdfs://es05:54310/caspr/csv/smalltest.csv]]]) >>> ~~~ >>> >>> ~~~ >>> { >>> "head" : { >>> "version" : 1, >>> "generator" : { >>> "type" : "ExplainHandler", >>> "info" : "" >>> }, >>> "type" : "APACHE_DRILL_PHYSICAL", >>> "options" : [ { >>> "kind" : "STRING", >>> "type" : "SESSION", >>> "name" : "store.format", >>> "string_val" : "parquet" >>> }, { >>> "kind" : "BOOLEAN", >>> "type" : "SESSION", >>> "name" : "exec.errors.verbose", >>> "bool_val" : true >>> } ], >>> "queue" : 0, >>> "resultMode" : "EXEC" >>> }, >>> "graph" : [ { >>> "pop" : "fs-scan", >>> "@id" : 2, >>> "userName" : "hduser", >>> "files" : [ "hdfs://es05:54310/caspr/csv/smalltest.csv" ], >>> "storage" : { >>> "type" : "file", >>> "enabled" : true, >>> "connection" : "hdfs://es05:54310", >>> "workspaces" : { >>> "root" : { >>> "location" : "/", >>> "writable" : true, >>> "defaultInputFormat" : null >>> }, >>> "tmp" : { >>> "location" : "/tmp/", >>> "writable" : true, >>> "defaultInputFormat" : null >>> }, >>> "caspr" : { >>> "location" : "/caspr", >>> "writable" : true, >>> "defaultInputFormat" : "csv" >>> } >>> "ref" : "`t_3000`", >>> "expr" : "cast( ( ( if (isnotnull(`t_3000`) ) then (`t_3000` ) >>> else (0 ) end ) ) as FLOAT8 )" >>> }, { >>> "ref" : "`h_3000`", >>> "expr" : "cast( ( ( if (isnotnull(`h_3000`) ) then (`h_3000` ) >>> else (0 ) end ) ) as BIGINT )" >>> }, { >>> "ref" : "`b_2000`", >>> "expr" : "cast( ( ( if (isnotnull(`b_2000`) ) then (`b_2000` ) >>> else (0 ) end ) ) as BIGINT )" >>> }, { >>> "ref" : "`t_2000`", >>> "expr" : "cast( ( ( if (isnotnull(`t_2000`) ) then (`t_2000` ) >>> else (0 ) end ) ) as FLOAT8 )" >>> }, { >>> "ref" : "`h_2000`", >>> "expr" : "cast( ( ( if (isnotnull(`h_2000`) ) then (`h_2000` ) >>> else (0 ) end ) ) as BIGINT )" >>> }, { >>> "ref" : "`b_1500`", >>> "expr" : "cast( ( ( if (isnotnull(`b_1500`) ) then (`b_1500` ) >>> else (0 ) end ) ) as BIGINT )" >>> }, { >>> "ref" : "`t_1500`", >>> "expr" : "cast( ( ( if (isnotnull(`t_1500`) ) then (`t_1500` ) >>> else (0 ) end ) ) as FLOAT8 )" >>> }, { >>> "ref" : "`h_1500`", >>> "expr" : "cast( ( ( if (isnotnull(`h_1500`) ) then (`h_1500` ) >>> else (0 ) end ) ) as BIGINT )" >>> }, { >>> "ref" : "`b_1250`", >>> "expr" : "cast( ( ( if (isnotnull(`b_1250`) ) then (`b_1250` ) >>> else (0 ) end ) ) as BIGINT )" >>> }, { >>> "ref" : "`t_1250`", >>> "expr" : "cast( ( ( if (isnotnull(`t_1250`) ) then (`t_1250` ) >>> else (0 ) end ) ) as FLOAT8 )" >>> }, { >>> "ref" : "`h_1250`", >>> "expr" : "cast( ( ( if (isnotnull(`h_1250`) ) then (`h_1250` ) >>> else (0 ) end ) ) as BIGINT )" >>> }, { >>> "ref" : "`b_1000`", >>> "expr" : "cast( ( ( if (isnotnull(`b_1000`) ) then (`b_1000` ) >>> else (0 ) end ) ) as BIGINT )" >>> }, { >>> "ref" : "`t_1000`", >>> "expr" : "cast( ( ( if (isnotnull(`t_1000`) ) then (`t_1000` ) >>> else (0 ) end ) ) as FLOAT8 )" >>> } >>> ~~~ >>> >>> >>> On 10 Mar 2016, at 18:40, Hsuan Yi Chu wrote: >>> >>> From the log, I see this one: >>> >>>> >>>> StringFunctionHelpers.varCharToInt(): >>>> >>>> It seems like there might be an implicit casting (to integer) be >>>> inserted >>>> into your expression. >>>> >>>> Can you do EXPLAIN PLAN FOR "your query" (not double quoted needed) >>>> >>>> and share what you see? >>>> >>>> On Thu, Mar 10, 2016 at 3:11 PM, Matt <bsg...@gmail.com> wrote: >>>> >>>> TRIM() did not help, and I visually verified there is no whitespace >>>> around >>>> >>>>> the number: The entire test data file: >>>>> >>>>> ~~~ >>>>> >>>>> >>>>> >>>>> date_tm,id_1,id_2,id_3,b_total,t_total,h_total,b_small,t_small,h_small,b_18000,t_18000,h_18000,b_12000,t_12000,h_12000,b_6000,t_6000,h_6000,b_3000,t_3000,h_3000,b_2000,t_2000,h_2000,b_1500,t_1500,h_1500,b_1250,t_1250,h_1250,b_1000,t_1000,h_1000,b_750,t_750,h_750,b_500,t_500,h_500,b_0,t_0,h_0 >>>>> 2015-10-17 >>>>> >>>>> >>>>> 00:00:00,f5e9v8u2,err,mi1,268918254,140.755,23519,268918254,140.755,23519,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, >>>>> ~~~ >>>>> >>>>> Including a trim resulted in the same: >>>>> >>>>> ~~~ >>>>> Error: SYSTEM ERROR: NumberFormatException: 140.755 >>>>> >>>>> Fragment 0:0 >>>>> >>>>> [Error Id: 01d368ee-6c06-476c-a553-5eb5baea7c3f on es07:31010] >>>>> >>>>> (java.lang.NumberFormatException) 140.755 >>>>> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95 >>>>> >>>>> >>>>> >>>>> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToInt():120 >>>>> org.apache.drill.exec.test.generated.ProjectorGen122.doEval():467 >>>>> >>>>> >>>>> org.apache.drill.exec.test.generated.ProjectorGen122.projectRecords():62 >>>>> >>>>> >>>>> >>>>> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():175 >>>>> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 >>>>> >>>>> >>>>> >>>>> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132 >>>>> org.apache.drill.exec.record.AbstractRecordBatch.next():162 >>>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():104 >>>>> >>>>> >>>>> >>>>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 >>>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94 >>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256 >>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250 >>>>> java.security.AccessController.doPrivileged():-2 >>>>> javax.security.auth.Subject.doAs():415 >>>>> org.apache.hadoop.security.UserGroupInformation.doAs():1657 >>>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():250 >>>>> org.apache.drill.common.SelfCleaningRunnable.run():38 >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1145 >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():615 >>>>> java.lang.Thread.run():745 (state=,code=0) >>>>> ~~~ >>>>> >>>>> On 10 Mar 2016, at 17:49, Jason Altekruse wrote: >>>>> >>>>> Could you try throwing a trim() in before the cast? Might just be >>>>> >>>>> whitespace. >>>>>> >>>>>> There shouldn't be a need to add the COALESCE statement, as we only >>>>>> ever >>>>>> read defined values (which may be empty string) out of a CSV file. You >>>>>> should instead write out a full case statement that checks for empty >>>>>> string >>>>>> and provides your default value of 0 in that case. >>>>>> >>>>>> - Jason >>>>>> >>>>>> Jason Altekruse >>>>>> Software Engineer at Dremio >>>>>> Apache Drill Committer >>>>>> >>>>>> On Thu, Mar 10, 2016 at 2:32 PM, Matt <bsg...@gmail.com> wrote: >>>>>> >>>>>> Have some CSV data that Drill 1.5 selects as-is without any problems, >>>>>> >>>>>> until I attempt to CAST columns in a CTAS or plain SELECT: >>>>>>> >>>>>>> Error: SYSTEM ERROR: NumberFormatException: 140.755 >>>>>>> >>>>>>> The data is unquoted CSV, and column in question does have the value >>>>>>> "140.755" (unquoted). As the column can be empty, I am using the >>>>>>> following >>>>>>> transform: >>>>>>> >>>>>>> CAST(COALESCE(t_total, 0) AS double) AS t_total >>>>>>> >>>>>>> And on the first data row: >>>>>>> >>>>>>> Fragment 1:0 >>>>>>> >>>>>>> [Error Id: 4b4d83cf-a87d-451e-9cfa-4280e5adf64f on es08:31010] >>>>>>> >>>>>>> (java.lang.NumberFormatException) 140.755 >>>>>>> >>>>>>> What could be causing this error? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >> >> -- >> >> Abdelhakim Deneche >> >> Software Engineer >> >> <http://www.mapr.com/> >> >> >> Now Available - Free Hadoop On-Demand Training >> < >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> > >> >