Re: About a "DRILL-5033" fix

Charles Givre Thu, 29 Dec 2022 05:18:32 -0800

Hi Marc, 
That's great!
What you have to do is:

1.  Fork Drill into your own github account
2.  Create a branch with your changes.
3.  Then you should be able to create a pull request with your changes.  Here 
are the contribution guidelines 
(https://drill.apache.org/docs/apache-drill-contribution-guidelines/)  Please 
name your pull request DRILL-XXXX to reflect the JIRA ticket you are 
addressing.  
4.  Once the CI passes, a committer will review and merge the PR.


Thanks!
-- C 


> On Dec 29, 2022, at 7:39 AM, marc nicole <[email protected]> wrote:
> 
> Hi,
> 
> Thanks for your reply,
> I actually want to submit my changes, but I am being denied to push any 
> changes to the Drill repo. How to do the pull request in Git ? Are there any 
> permissions required to get beforehand pushing to the repo ?
> 
> 
> Le mer. 28 déc. 2022 à 15:46, Charles Givre <[email protected] 
> <mailto:[email protected]>> a écrit :
>> Hi Marc, 
>> Thanks for this.  Here's the thing... Let's say you have json that looks 
>> like this:
>> 
>> {
>>         "foo":null
>> },{
>>         "foo": 3.5
>> }
>> 
>> If you take the approach that `null` is treated like a string, you will get 
>> a schema change exception when you read the next row.  Our current approach 
>> is to basically ignore fields that Drill cannot figure out what they are in 
>> terns of data type.  Once Drill encounters a data type, it will then assign 
>> a data type to that column.  See the example below which is from DRILL-5033. 
>>  I added a second row to demonstrate what happens once Drill is able to 
>> determine a data type.  Note that for the columns with a defined value in 
>> the second row, Drill returns 'null' as the value. 
>> 
>> 
>> [{
>> "intKey" : null,
>> "bgintKey": null,
>> "strKey": null,
>> "boolKey": null,
>> "fltKey": null,
>> "dblKey": null,
>> "timKey": null,
>> "dtKey": null,
>> "tmstmpKey": null,
>> "intrvldyKey": null,
>> "intrvlyrKey": null
>> },
>> {
>> "intKey" : 1,
>> "bgintKey": 3666565464,
>> "strKey": "hithere",
>> "boolKey": true,
>> "fltKey": 3.5,
>> "dblKey": 4.2,
>> "timKey": null,
>> "dtKey": null,
>> "tmstmpKey": null,
>> "intrvldyKey": null,
>> "intrvlyrKey": null
>> }]
>> 
>> 
>> select * from dfs.test.`nulls.json`;
>> +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
>> | intKey |   bgintKey    | strKey  | boolKey | fltKey | dblKey | timKey | 
>> dtKey | tmstmpKey | intrvldyKey | intrvlyrKey |
>> +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
>> | null   | null          | null    | null    | null   | null   | []     | [] 
>>    | []        | []          | []          |
>> | 1.0    | 3.666565464E9 | hithere | true    | 3.5    | 4.2    | []     | [] 
>>    | []        | []          | []          |
>> +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
>> 2 rows selected (0.232 seconds)
>> 
>> You are definitely welcome to submit a pull request, however this area is 
>> extremely complex, and I'd suspect that what you propose will break other 
>> unit tests.  Another option which you might not be aware of is providing a 
>> schema.  If you do that from the beginning, then Drill will know what data 
>> types to expect. 
>> 
>> Best,
>> -- C
>> 
>> 
>> > On Dec 28, 2022, at 8:57 AM, marc nicole <[email protected] 
>> > <mailto:[email protected]>> wrote:
>> > 
>> > Hello Drillers :)
>> > 
>> > I came across the aforementioned bug (DRILL-5033) and wanted to contribute.
>> > My attempt is to consider a *null *token as a *string *and print the "null"
>> > as the column value instead of omitting the key in the output
>> > resultset, details
>> > of the fix attempt is below:
>> > 
>> > 
>> > *1)* In JsonReader.java (java-exec/drill-exec/vector/complex/fn/) at line
>> > 283 i add the following:
>> > 
>> >> ...
>> >> case VALUE_NULL:
>> >>          // handle null as string
>> >>          handleString(parser, map, fieldName);
>> >>          break;
>> >> ...
>> > 
>> > 
>> > *2)* then at line 415 the handleString() becomes:
>> > 
>> > private void handleString(JsonParser parser, MapWriter writer, String
>> >> fieldName) throws IOException {
>> >>    try {
>> >>     // added the following if
>> >>      if (parser.nextToken() == VALUE_NULL)
>> >>        writer.varChar(fieldName)
>> >>          .writeVarChar(0, workingBuffer.prepareVarCharHolder("null"),
>> >> workingBuffer.getBuf());
>> >>      else
>> >>      writer.varChar(fieldName)
>> >>          .writeVarChar(0,
>> >> workingBuffer.prepareVarCharHolder(parser.getText()),
>> >> workingBuffer.getBuf());
>> >>    } catch (IllegalArgumentException e) {
>> >>      if (parser.getText() == null || parser.getText().isEmpty()) {
>> >>       // return;
>> >>      }
>> >>      throw e;
>> >>    }
>> >>  }
>> > 
>> > 
>> > 
>> > Is this a possible fix to the mentioned bug?
>> > If yes should i pull request ?
>> > 
>> > Thanks.
>>

Re: About a "DRILL-5033" fix

Reply via email to