[ https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756425#comment-17756425 ]
ASF GitHub Bot commented on DRILL-8450: --------------------------------------- cgivre commented on code in PR #2819: URL: https://github.com/apache/drill/pull/2819#discussion_r1299285670 ########## contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXmlOptions.java: ########## @@ -111,7 +111,7 @@ public String toString() { public static class HttpXmlOptionsBuilder { private int dataLevel; - private boolean allTextMode; + private Boolean allTextMode; Review Comment: @mbeckerle In the JSON reader there are two parameters: `allTextMode` and `readAllNumbersAsDouble`. Both are boolean. For the XML reader, I chose not to implement the `readAllNumbersAsDouble` parameter because in practice, it requires very clean data. From using Drill with clients, I can tell you from a lot of personal experience that this was one of the biggest data challenges. For instance, you'd get data where there was an DOUBLE field and then there would be a row with zero denoted as `0`. This would then cause schema change exceptions. We have actually made significant improvements in Drill's implicit casting rules which do prevent a lot of schema change exceptions and as a result, IMHO, it makes distinguishing between INTs and DOUBLES a lot less important. So.. out of laziness I decided it wasn't worth it. I can be convinced otherwise. > Add Data Type Inference to XML Format Plugin > -------------------------------------------- > > Key: DRILL-8450 > URL: https://issues.apache.org/jira/browse/DRILL-8450 > Project: Apache Drill > Issue Type: Improvement > Components: Format - XML > Affects Versions: 1.21.1 > Reporter: Charles Givre > Assignee: Charles Givre > Priority: Major > Fix For: 1.22.0 > > > This PR adds data type inference to the XML format plugin. In similar > fashion to other plugins, it adds a new configuration parameter: allTextMode, > which when set to true, reads all data as strings. The default is true. > Note that the inference is limited to doubles, date, timestamps, boolean and > strings. -- This message was sent by Atlassian Jira (v8.20.10#820010)