THanks James for confirming. This helps a lot. I can move to 1.21.1. But before that I need to try the same with the parquet also. Will check that too and make a decision.
Regards Prabhakar On Tue, Aug 8, 2023 at 12:51 PM James Turton <[email protected]> wrote: > Okay I get the same result as you when I try with Drill 1.20.3 so I > guess that there's a JSON reader bug that got fixed somewhere in between > 1.20.3 and 1.21.1. Do you need to stay on Drill 1.20 or can you upgrade > to 1.21? > > On 2023/08/07 08:32, Prabhakar Bhosale wrote: > > hi James, > > I re-tried the steps once again after looking at the output you shared > with > > me. But still I am not able to get the expected output on drill 1.20.1. > > Then I downloaded drill 1.21.1 and tried the same steps. This time I got > > the expected output. > > So can you please try the same on drill 1.20.1 once if possible? thx > > > > REgards > > Prabhakar > > > > On Wed, Aug 2, 2023 at 3:11 PM James Turton <[email protected]> wrote: > > > >> Hi! > >> > >> I just got back from travelling. I ran a test and Drill did do what I > >> believe you're after. > >> > >> Here's the test JSON file > >> > >> ➜ ~ cat /tmp/foo/bar.json > >> {"id":"T06125309","cust_id":"A20","num":"VAB6169028"} > >> > >> > >> And here's my Drill session. > >> > >> apache drill> use dfs.tmp; > >> ok true > >> summary Default schema changed to [dfs.tmp] > >> > >> apache drill (dfs.tmp)> select * from foo; > >> id T06125309 > >> cust_id A20 > >> num VAB6169028 > >> > >> apache drill (dfs.tmp)> create schema (id varchar, cust_id varchar, num > >> varchar, tranamt double not null default '1.11' > >> ) for table foo; > >> ok true > >> summary Created schema for [foo] > >> > >> 1 row selected (0.257 seconds) > >> apache drill (dfs.tmp)> select * from foo; > >> id T06125309 > >> cust_id A20 > >> num VAB6169028 > >> tranamt 1.11 > >> > >> > >> On 2023/07/27 07:38, Prabhakar Bhosale wrote: > >> > >> Hi James, > >> Any advice on the problem on schema for json file as mentioned in > >> my previous mail? > >> > >> Regards > >> Prabhakar > >> > >> On Mon, Jul 24, 2023 at 3:41 PM Prabhakar Bhosale < > [email protected]> <[email protected]> > >> wrote: > >> > >> > >> Hi James, > >> > >> I tried the same on the JSON file and it is still not working. > >> > >> Below is JSON file content > >> > >> *{"id":"T06125309","cust_id":"A20","num":"VAB6169028"}* > >> > >> The contents of the ".query.schema" file are as follows > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> *{ "table" : "mystore.`TEST_MOD`", "schema" : { "type" : > >> "tuple_schema", "columns" : [ { "name" : "ID", > "type" > >> : "VARCHAR", "mode" : "OPTIONAL" }, { "name" : > >> "CUST_ID", "type" : "VARCHAR", "mode" : "OPTIONAL" }, > >> { "name" : "NUM", "type" : "VARCHAR", "mode" : > >> "OPTIONAL" }, { "name" : "TRANAMT", "type" : > >> "DOUBLE", "mode" : "REQUIRED", "properties" : { > >> "drill.default" : "1.11" } } ] }, "version" : 1}* > >> > >> When I fire the query below, I expect the value of TRANAMT column to be > >> *1.11* but it gives out value as *NULL* > >> > >> *select A.ID <http://A.ID> <http://A.ID>, A.CUST_ID, A.NUM, A.TRANAMT > from > >> table(mystore.`TEST_MOD`(schema => > >> 'path=`/archived_files_nw/TEST_MOD/.drill.schema`')) A* > >> > >> So essentially it is not considering the schema given at the query > >> execution time. Please let me know if I am doing anything incorrectly. > >> > >> > >> Thanks and Regards > >> Prabhakar > >> > >> > >> On Wed, Jul 12, 2023 at 4:33 PM James Turton <[email protected]> < > [email protected]> wrote: > >> > >> > >> Hi Prabhakar > >> > >> From what I recall, Drill won't consider a provided schema when > >> querying Parquet because Parquet files bundle their own schema. You > >> might need to use a SQL function like COALESCE(TRAN_AMOUNT, 1.11) and > >> possibly put that in a SQL view for reuse. > >> > >> Regards > >> James > >> > >> On 2023/07/11 18:40, Prabhakar Bhosale wrote: > >> > >> Hi Team, > >> I am using drill 1.20.1 with parquet files. > >> > >> I have two parquet files in a directory with one column missing in one > >> file. When I query the directory it gives me NULL values for all those > >> > >> rows > >> > >> which are from the file where that column is missing. > >> > >> But I want a specific value for that column instead of NULL. So I > >> have created the schema as given below. But even after creating it is > >> > >> still > >> > >> returning the NULL value. Please let me know what is going wrong. > >> > >> I have also ensured that storage.table.user_schema_file=true at system > >> level. > >> > >> The files are stored on linux mount point. > >> The name of the missing column is "TRAN_AMOUNT". > >> > >> > >> > >> The schema is as below > >> > >> { > >> "table" : "archive.default.`executions`", > >> "schema" : { > >> "type" : "tuple_schema", > >> "columns" : [ > >> { > >> "name" : "EXEC_ID", > >> "type" : "VARCHAR", > >> "mode" : "OPTIONAL" > >> }, > >> { > >> "name" : "CUST_ID", > >> "type" : "VARCHAR", > >> "mode" : "OPTIONAL" > >> }, > >> { > >> "name" : "CELL_ID", > >> "type" : "VARCHAR", > >> "mode" : "OPTIONAL" > >> }, > >> { > >> "name" : "TRAN_AMOUNT", > >> "type" : "FLOAT", > >> "mode" : "REQUIRED", > >> "properties" : { > >> "drill.default" : "1.11" > >> } > >> } > >> ] > >> }, > >> "version" : 1 > >> } > >> > >> > >> > >> > >
