Hi!

I just got back from travelling. I ran a test and Drill did do what I believe you're after.

Here's the test JSON file

   |➜  ~ cat /tmp/foo/bar.json|||
   |||{"id":"T06125309","cust_id":"A20","num":"VAB6169028"}|||
   ||


And here's my Drill session.

   |apache drill> use dfs.tmp;|||
   |||ok       true|||
   |||summary  Default schema changed to [dfs.tmp]|||
   ||||
   |||apache drill (dfs.tmp)> select * from foo;|||
   |||id       T06125309|||
   |||cust_id  A20|||
   |||num      VAB6169028|||
   ||||
   |||apache drill (dfs.tmp)> create schema (id varchar, cust_id
   varchar, num varchar, tranamt double not null default '1.11'|||
   |||) for table foo;|||
   |||ok       true|||
   |||summary  Created schema for [foo]|||
   ||||
   |||1 row selected (0.257 seconds)|||
   |||apache drill (dfs.tmp)> select * from foo;|||
   |||id       T06125309|||
   |||cust_id  A20|||
   |||num      VAB6169028|||
   |||tranamt  1.11|||
   ||


On 2023/07/27 07:38, Prabhakar Bhosale wrote:
Hi James,
Any advice on the problem on schema for json file as mentioned in
my previous mail?

Regards
Prabhakar

On Mon, Jul 24, 2023 at 3:41 PM Prabhakar Bhosale<[email protected]>
wrote:

Hi James,

I tried the same on the JSON file and it is still not working.

Below is JSON file content

*{"id":"T06125309","cust_id":"A20","num":"VAB6169028"}*

The contents of the ".query.schema" file are as follows































*{  "table" : "mystore.`TEST_MOD`",  "schema" : {    "type" :
"tuple_schema",    "columns" : [      {        "name" : "ID",        "type"
: "VARCHAR",        "mode" : "OPTIONAL"      },      {        "name" :
"CUST_ID",        "type" : "VARCHAR",        "mode" : "OPTIONAL"      },
   {        "name" : "NUM",        "type" : "VARCHAR",        "mode" :
"OPTIONAL"      },      {        "name" : "TRANAMT",        "type" :
"DOUBLE",        "mode" : "REQUIRED",        "properties" : {
"drill.default" : "1.11"        }      }    ]  },  "version" : 1}*

When I fire the query below, I expect the value of TRANAMT column to be
*1.11* but it gives out value as *NULL*

*select  A.ID<http://A.ID>, A.CUST_ID, A.NUM, A.TRANAMT from
table(mystore.`TEST_MOD`(schema =>
'path=`/archived_files_nw/TEST_MOD/.drill.schema`')) A*

So essentially it is not considering the schema given at the query
execution time. Please let me know if I am doing anything incorrectly.


Thanks and Regards
Prabhakar


On Wed, Jul 12, 2023 at 4:33 PM James Turton<[email protected]>  wrote:

Hi Prabhakar

  From what I recall, Drill won't consider a provided schema when
querying Parquet because Parquet files bundle their own schema. You
might need to use a SQL function like COALESCE(TRAN_AMOUNT, 1.11) and
possibly put that in a SQL view for reuse.

Regards
James

On 2023/07/11 18:40, Prabhakar Bhosale wrote:
Hi Team,
I am using drill 1.20.1 with parquet files.

I have two parquet files in a directory with one column missing in one
file. When I query the directory it gives me NULL values for all those
rows
which are from the file where that column is missing.

But I want a specific value for that column instead of NULL. So I
have created the schema as given below. But even after creating it is
still
returning the NULL value. Please let me know what is going wrong.

I have also ensured that storage.table.user_schema_file=true at system
level.

The files are stored on linux mount point.
The name of the missing column is "TRAN_AMOUNT".



The schema is as below

{
    "table" : "archive.default.`executions`",
    "schema" : {
      "type" : "tuple_schema",
      "columns" : [
        {
          "name" : "EXEC_ID",
          "type" : "VARCHAR",
          "mode" : "OPTIONAL"
        },
        {
          "name" : "CUST_ID",
          "type" : "VARCHAR",
          "mode" : "OPTIONAL"
        },
        {
          "name" : "CELL_ID",
          "type" : "VARCHAR",
          "mode" : "OPTIONAL"
        },
        {
          "name" : "TRAN_AMOUNT",
          "type" : "FLOAT",
          "mode" : "REQUIRED",
          "properties" : {
            "drill.default" : "1.11"
      }
        }
      ]
    },
    "version" : 1
}


Reply via email to