If you like to treat "" as NULL, please set this option
"drill.exec.functions.cast_empty_string_to_null" as true in the system
level.

Meaning,
alter system set `drill.exec.functions.cast_empty_string_to_null` = true

On Sun, Mar 13, 2016 at 3:53 PM, Matt <bsg...@gmail.com> wrote:

> That does indeed remove the NumberFormatException, replacing it with the
> same error on "empty String".
>
> As indicated by Jason in this thread I may have mistakenly believed that
> empty columns would be treated as NULLs.
>
> Is there a setting to set the NULL value in the data, or should all of
> those COALESCE be replaced with CASE statements? It would be a lot of CASEs
> in this case (39 of them).
>
>
> On 11 Mar 2016, at 0:31, Abdel Hakim Deneche wrote:
>
> Looks like the COALESCE function is the source of the problem. Passing a
>> double (0.0) instead of an int (0) as a second expression solved the
>> problem for me:
>>
>> CAST(COALESCE(t_total, 0.0) AS double)
>>
>>
>> On Fri, Mar 11, 2016 at 12:45 AM, Matt <bsg...@gmail.com> wrote:
>>
>> ~~~
>>> 00-01      Project(date_tm=[CAST($23):TIMESTAMP(0)],
>>> id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
>>> "ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET
>>> "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
>>> id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
>>> "ISO-8859-1$en_US$primary"], b_total=[CAST(CASE(IS NOT NULL($4), $4,
>>> 0)):BIGINT], t_total=[CAST(CASE(IS NOT NULL($31), $31, 0)):DOUBLE],
>>> h_total=[CAST(CASE(IS NOT NULL($40), $40, 0)):BIGINT],
>>> b_small=[CAST(CASE(IS NOT NULL($36), $36, 0)):BIGINT],
>>> t_small=[CAST(CASE(IS NOT NULL($14), $14, 0)):DOUBLE],
>>> h_small=[CAST(CASE(IS NOT NULL($38), $38, 0)):BIGINT],
>>> b_18000=[CAST(CASE(IS NOT NULL($32), $32, 0)):BIGINT],
>>> t_18000=[CAST(CASE(IS NOT NULL($24), $24, 0)):DOUBLE],
>>> h_18000=[CAST(CASE(IS NOT NULL($27), $27, 0)):BIGINT],
>>> b_12000=[CAST(CASE(IS NOT NULL($30), $30, 0)):BIGINT],
>>> t_12000=[CAST(CASE(IS NOT NULL($28), $28, 0)):DOUBLE],
>>> h_12000=[CAST(CASE(IS NOT NULL($20), $20, 0)):BIGINT],
>>> b_6000=[CAST(CASE(IS
>>> NOT NULL($41), $41, 0)):BIGINT], t_6000=[CAST(CASE(IS NOT NULL($37), $37,
>>> 0)):DOUBLE], h_6000=[CAST(CASE(IS NOT NULL($29), $29, 0)):BIGINT],
>>> b_3000=[CAST(CASE(IS NOT NULL($17), $17, 0)):BIGINT],
>>> t_3000=[CAST(CASE(IS
>>> NOT NULL($7), $7, 0)):DOUBLE], h_3000=[CAST(CASE(IS NOT NULL($1), $1,
>>> 0)):BIGINT], b_2000=[CAST(CASE(IS NOT NULL($26), $26, 0)):BIGINT],
>>> t_2000=[CAST(CASE(IS NOT NULL($34), $34, 0)):DOUBLE],
>>> h_2000=[CAST(CASE(IS
>>> NOT NULL($10), $10, 0)):BIGINT], b_1500=[CAST(CASE(IS NOT NULL($42), $42,
>>> 0)):BIGINT], t_1500=[CAST(CASE(IS NOT NULL($13), $13, 0)):DOUBLE],
>>> h_1500=[CAST(CASE(IS NOT NULL($3), $3, 0)):BIGINT], b_1250=[CAST(CASE(IS
>>> NOT NULL($21), $21, 0)):BIGINT], t_1250=[CAST(CASE(IS NOT NULL($25), $25,
>>> 0)):DOUBLE], h_1250=[CAST(CASE(IS NOT NULL($16), $16, 0)):BIGINT],
>>> b_1000=[CAST(CASE(IS NOT NULL($12), $12, 0)):BIGINT],
>>> t_1000=[CAST(CASE(IS
>>> NOT NULL($19), $19, 0)):DOUBLE], h_1000=[CAST(CASE(IS NOT NULL($6), $6,
>>> 0)):BIGINT], b_750=[CAST(CASE(IS NOT NULL($9), $9, 0)):BIGINT],
>>> t_750=[CAST(CASE(IS NOT NULL($0), $0, 0)):DOUBLE], h_750=[CAST(CASE(IS
>>> NOT
>>> NULL($5), $5, 0)):BIGINT], b_500=[CAST(CASE(IS NOT NULL($2), $2,
>>> 0)):BIGINT], t_500=[CAST(CASE(IS NOT NULL($8), $8, 0)):DOUBLE],
>>> h_500=[CAST(CASE(IS NOT NULL($39), $39, 0)):BIGINT], b_0=[CAST(CASE(IS
>>> NOT
>>> NULL($18), $18, 0)):BIGINT], t_0=[CAST(CASE(IS NOT NULL($35), $35,
>>> 0)):DOUBLE], EXPR$42=[CAST(CASE(IS NOT NULL($22), $22, 0)):BIGINT])
>>> 00-02        Scan(groupscan=[EasyGroupScan
>>> [selectionRoot=hdfs://es05:54310/caspr/csv/smalltest.csv, numFiles=1,
>>> columns=[`date_tm`, `id_1`, `id_2`, `id_3`, `b_total`, `t_total`,
>>> `h_total`, `b_small`, `t_small`, `h_small`, `b_18000`, `t_18000`,
>>> `h_18000`, `b_12000`, `t_12000`, `h_12000`, `b_6000`, `t_6000`, `h_6000`,
>>> `b_3000`, `t_3000`, `h_3000`, `b_2000`, `t_2000`, `h_2000`, `b_1500`,
>>> `t_1500`, `h_1500`, `b_1250`, `t_1250`, `h_1250`, `b_1000`, `t_1000`,
>>> `h_1000`, `b_750`, `t_750`, `h_750`, `b_500`, `t_500`, `h_500`, `b_0`,
>>> `t_0`, `h_0`], files=[hdfs://es05:54310/caspr/csv/smalltest.csv]]])
>>> ~~~
>>>
>>> ~~~
>>> {
>>>   "head" : {
>>>     "version" : 1,
>>>     "generator" : {
>>>       "type" : "ExplainHandler",
>>>       "info" : ""
>>>     },
>>>     "type" : "APACHE_DRILL_PHYSICAL",
>>>     "options" : [ {
>>>       "kind" : "STRING",
>>>       "type" : "SESSION",
>>>       "name" : "store.format",
>>>       "string_val" : "parquet"
>>>     }, {
>>>       "kind" : "BOOLEAN",
>>>       "type" : "SESSION",
>>>       "name" : "exec.errors.verbose",
>>>       "bool_val" : true
>>>     } ],
>>>     "queue" : 0,
>>>     "resultMode" : "EXEC"
>>>   },
>>>   "graph" : [ {
>>>     "pop" : "fs-scan",
>>>     "@id" : 2,
>>>     "userName" : "hduser",
>>>     "files" : [ "hdfs://es05:54310/caspr/csv/smalltest.csv" ],
>>>     "storage" : {
>>>       "type" : "file",
>>>       "enabled" : true,
>>>       "connection" : "hdfs://es05:54310",
>>>       "workspaces" : {
>>>         "root" : {
>>>           "location" : "/",
>>>           "writable" : true,
>>>           "defaultInputFormat" : null
>>>         },
>>>         "tmp" : {
>>>           "location" : "/tmp/",
>>>           "writable" : true,
>>>           "defaultInputFormat" : null
>>>         },
>>>         "caspr" : {
>>>           "location" : "/caspr",
>>>           "writable" : true,
>>>           "defaultInputFormat" : "csv"
>>>         }
>>>       "ref" : "`t_3000`",
>>>       "expr" : "cast( ( ( if (isnotnull(`t_3000`)  ) then (`t_3000` )
>>> else (0 )  end  )  ) as FLOAT8 )"
>>>     }, {
>>>       "ref" : "`h_3000`",
>>>       "expr" : "cast( ( ( if (isnotnull(`h_3000`)  ) then (`h_3000` )
>>> else (0 )  end  )  ) as BIGINT )"
>>>     }, {
>>>       "ref" : "`b_2000`",
>>>       "expr" : "cast( ( ( if (isnotnull(`b_2000`)  ) then (`b_2000` )
>>> else (0 )  end  )  ) as BIGINT )"
>>>     }, {
>>>       "ref" : "`t_2000`",
>>>       "expr" : "cast( ( ( if (isnotnull(`t_2000`)  ) then (`t_2000` )
>>> else (0 )  end  )  ) as FLOAT8 )"
>>>     }, {
>>>       "ref" : "`h_2000`",
>>>       "expr" : "cast( ( ( if (isnotnull(`h_2000`)  ) then (`h_2000` )
>>> else (0 )  end  )  ) as BIGINT )"
>>>     }, {
>>>       "ref" : "`b_1500`",
>>>       "expr" : "cast( ( ( if (isnotnull(`b_1500`)  ) then (`b_1500` )
>>> else (0 )  end  )  ) as BIGINT )"
>>>     }, {
>>>       "ref" : "`t_1500`",
>>>       "expr" : "cast( ( ( if (isnotnull(`t_1500`)  ) then (`t_1500` )
>>> else (0 )  end  )  ) as FLOAT8 )"
>>>     }, {
>>>       "ref" : "`h_1500`",
>>>       "expr" : "cast( ( ( if (isnotnull(`h_1500`)  ) then (`h_1500` )
>>> else (0 )  end  )  ) as BIGINT )"
>>>     }, {
>>>       "ref" : "`b_1250`",
>>>       "expr" : "cast( ( ( if (isnotnull(`b_1250`)  ) then (`b_1250` )
>>> else (0 )  end  )  ) as BIGINT )"
>>>     }, {
>>>       "ref" : "`t_1250`",
>>>       "expr" : "cast( ( ( if (isnotnull(`t_1250`)  ) then (`t_1250` )
>>> else (0 )  end  )  ) as FLOAT8 )"
>>>     }, {
>>>       "ref" : "`h_1250`",
>>>       "expr" : "cast( ( ( if (isnotnull(`h_1250`)  ) then (`h_1250` )
>>> else (0 )  end  )  ) as BIGINT )"
>>>     }, {
>>>       "ref" : "`b_1000`",
>>>       "expr" : "cast( ( ( if (isnotnull(`b_1000`)  ) then (`b_1000` )
>>> else (0 )  end  )  ) as BIGINT )"
>>>     }, {
>>>       "ref" : "`t_1000`",
>>>       "expr" : "cast( ( ( if (isnotnull(`t_1000`)  ) then (`t_1000` )
>>> else (0 )  end  )  ) as FLOAT8 )"
>>>     }
>>> ~~~
>>>
>>>
>>> On 10 Mar 2016, at 18:40, Hsuan Yi Chu wrote:
>>>
>>> From the log, I see this one:
>>>
>>>>
>>>> StringFunctionHelpers.varCharToInt():
>>>>
>>>> It seems like there might be an implicit casting (to integer) be
>>>> inserted
>>>> into your expression.
>>>>
>>>> Can you do EXPLAIN PLAN FOR  "your query" (not double quoted needed)
>>>>
>>>> and share what you see?
>>>>
>>>> On Thu, Mar 10, 2016 at 3:11 PM, Matt <bsg...@gmail.com> wrote:
>>>>
>>>> TRIM() did not help, and I visually verified there is no whitespace
>>>> around
>>>>
>>>>> the number: The entire test data file:
>>>>>
>>>>> ~~~
>>>>>
>>>>>
>>>>>
>>>>> date_tm,id_1,id_2,id_3,b_total,t_total,h_total,b_small,t_small,h_small,b_18000,t_18000,h_18000,b_12000,t_12000,h_12000,b_6000,t_6000,h_6000,b_3000,t_3000,h_3000,b_2000,t_2000,h_2000,b_1500,t_1500,h_1500,b_1250,t_1250,h_1250,b_1000,t_1000,h_1000,b_750,t_750,h_750,b_500,t_500,h_500,b_0,t_0,h_0
>>>>> 2015-10-17
>>>>>
>>>>>
>>>>> 00:00:00,f5e9v8u2,err,mi1,268918254,140.755,23519,268918254,140.755,23519,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
>>>>> ~~~
>>>>>
>>>>> Including a trim resulted in the same:
>>>>>
>>>>> ~~~
>>>>> Error: SYSTEM ERROR: NumberFormatException: 140.755
>>>>>
>>>>> Fragment 0:0
>>>>>
>>>>> [Error Id: 01d368ee-6c06-476c-a553-5eb5baea7c3f on es07:31010]
>>>>>
>>>>>   (java.lang.NumberFormatException) 140.755
>>>>>   org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
>>>>>
>>>>>
>>>>>
>>>>> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToInt():120
>>>>>    org.apache.drill.exec.test.generated.ProjectorGen122.doEval():467
>>>>>
>>>>>
>>>>> org.apache.drill.exec.test.generated.ProjectorGen122.projectRecords():62
>>>>>
>>>>>
>>>>>
>>>>> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():175
>>>>>   org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
>>>>>
>>>>>
>>>>>
>>>>> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
>>>>>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>>>>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>>>>>
>>>>>
>>>>>
>>>>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>>>>>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>>>>    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>>>>>    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>>>>>     java.security.AccessController.doPrivileged():-2
>>>>>     javax.security.auth.Subject.doAs():415
>>>>>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>>>>>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>>>>>     org.apache.drill.common.SelfCleaningRunnable.run():38
>>>>>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>>>>>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>>>>>     java.lang.Thread.run():745 (state=,code=0)
>>>>> ~~~
>>>>>
>>>>> On 10 Mar 2016, at 17:49, Jason Altekruse wrote:
>>>>>
>>>>> Could you try throwing a trim() in before the cast? Might just be
>>>>>
>>>>> whitespace.
>>>>>>
>>>>>> There shouldn't be a need to add the COALESCE statement, as we only
>>>>>> ever
>>>>>> read defined values (which may be empty string) out of a CSV file. You
>>>>>> should instead write out a full case statement that checks for empty
>>>>>> string
>>>>>> and provides your default value of 0 in that case.
>>>>>>
>>>>>> - Jason
>>>>>>
>>>>>> Jason Altekruse
>>>>>> Software Engineer at Dremio
>>>>>> Apache Drill Committer
>>>>>>
>>>>>> On Thu, Mar 10, 2016 at 2:32 PM, Matt <bsg...@gmail.com> wrote:
>>>>>>
>>>>>> Have some CSV data that Drill 1.5 selects as-is without any problems,
>>>>>>
>>>>>> until I attempt to CAST columns in a CTAS or plain SELECT:
>>>>>>>
>>>>>>> Error: SYSTEM ERROR: NumberFormatException: 140.755
>>>>>>>
>>>>>>> The data is unquoted CSV, and column in question does have the value
>>>>>>> "140.755" (unquoted). As the column can be empty, I am using the
>>>>>>> following
>>>>>>> transform:
>>>>>>>
>>>>>>>   CAST(COALESCE(t_total, 0) AS double) AS t_total
>>>>>>>
>>>>>>> And on the first data row:
>>>>>>>
>>>>>>> Fragment 1:0
>>>>>>>
>>>>>>> [Error Id: 4b4d83cf-a87d-451e-9cfa-4280e5adf64f on es08:31010]
>>>>>>>
>>>>>>>   (java.lang.NumberFormatException) 140.755
>>>>>>>
>>>>>>> What could be causing this error?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>
>> --
>>
>> Abdelhakim Deneche
>>
>> Software Engineer
>>
>>   <http://www.mapr.com/>
>>
>>
>> Now Available - Free Hadoop On-Demand Training
>> <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>

Reply via email to