Re: About a "DRILL-5033" fix

marc nicole Thu, 29 Dec 2022 09:25:08 -0800

That is done.
Thanks!

Le jeu. 29 déc. 2022 à 14:18, Charles Givre <[email protected]> a écrit :


> Hi Marc,
> That's great!
> What you have to do is:
>
> 1.  Fork Drill into your own github account
> 2.  Create a branch with your changes.
> 3.  Then you should be able to create a pull request with your changes.
> Here are the contribution guidelines (
> https://drill.apache.org/docs/apache-drill-contribution-guidelines/)
>  Please name your pull request DRILL-XXXX to reflect the JIRA ticket you
> are addressing.
> 4.  Once the CI passes, a committer will review and merge the PR.
>
> Thanks!
> -- C
>
>
> On Dec 29, 2022, at 7:39 AM, marc nicole <[email protected]> wrote:
>
> Hi,
>
> Thanks for your reply,
> I actually want to submit my changes, but I am being denied to push any
> changes to the Drill repo. How to do the pull request in Git ? Are there
> any permissions required to get beforehand pushing to the repo ?
>
>
> Le mer. 28 déc. 2022 à 15:46, Charles Givre <[email protected]> a écrit :
>
>> Hi Marc,
>> Thanks for this.  Here's the thing... Let's say you have json that looks
>> like this:
>>
>> {
>>         "foo":null
>> },{
>>         "foo": 3.5
>> }
>>
>> If you take the approach that `null` is treated like a string, you will
>> get a schema change exception when you read the next row.  Our current
>> approach is to basically ignore fields that Drill cannot figure out what
>> they are in terns of data type.  Once Drill encounters a data type, it will
>> then assign a data type to that column.  See the example below which is
>> from DRILL-5033.  I added a second row to demonstrate what happens once
>> Drill is able to determine a data type.  Note that for the columns with a
>> defined value in the second row, Drill returns 'null' as the value.
>>
>>
>> [{
>> "intKey" : null,
>> "bgintKey": null,
>> "strKey": null,
>> "boolKey": null,
>> "fltKey": null,
>> "dblKey": null,
>> "timKey": null,
>> "dtKey": null,
>> "tmstmpKey": null,
>> "intrvldyKey": null,
>> "intrvlyrKey": null
>> },
>> {
>> "intKey" : 1,
>> "bgintKey": 3666565464,
>> "strKey": "hithere",
>> "boolKey": true,
>> "fltKey": 3.5,
>> "dblKey": 4.2,
>> "timKey": null,
>> "dtKey": null,
>> "tmstmpKey": null,
>> "intrvldyKey": null,
>> "intrvlyrKey": null
>> }]
>>
>>
>> select * from dfs.test.`nulls.json`;
>>
>> +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
>> | intKey |   bgintKey    | strKey  | boolKey | fltKey | dblKey | timKey |
>> dtKey | tmstmpKey | intrvldyKey | intrvlyrKey |
>>
>> +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
>> | null   | null          | null    | null    | null   | null   | []     |
>> []    | []        | []          | []          |
>> | 1.0    | 3.666565464E9 | hithere | true    | 3.5    | 4.2    | []     |
>> []    | []        | []          | []          |
>>
>> +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
>> 2 rows selected (0.232 seconds)
>>
>> You are definitely welcome to submit a pull request, however this area is
>> extremely complex, and I'd suspect that what you propose will break other
>> unit tests.  Another option which you might not be aware of is providing a
>> schema.  If you do that from the beginning, then Drill will know what data
>> types to expect.
>>
>> Best,
>> -- C
>>
>>
>> > On Dec 28, 2022, at 8:57 AM, marc nicole <[email protected]> wrote:
>> >
>> > Hello Drillers :)
>> >
>> > I came across the aforementioned bug (DRILL-5033) and wanted to
>> contribute.
>> > My attempt is to consider a *null *token as a *string *and print the
>> "null"
>> > as the column value instead of omitting the key in the output
>> > resultset, details
>> > of the fix attempt is below:
>> >
>> >
>> > *1)* In JsonReader.java (java-exec/drill-exec/vector/complex/fn/) at
>> line
>> > 283 i add the following:
>> >
>> >> ...
>> >> case VALUE_NULL:
>> >>          // handle null as string
>> >>          handleString(parser, map, fieldName);
>> >>          break;
>> >> ...
>> >
>> >
>> > *2)* then at line 415 the handleString() becomes:
>> >
>> > private void handleString(JsonParser parser, MapWriter writer, String
>> >> fieldName) throws IOException {
>> >>    try {
>> >>     // added the following if
>> >>      if (parser.nextToken() == VALUE_NULL)
>> >>        writer.varChar(fieldName)
>> >>          .writeVarChar(0, workingBuffer.prepareVarCharHolder("null"),
>> >> workingBuffer.getBuf());
>> >>      else
>> >>      writer.varChar(fieldName)
>> >>          .writeVarChar(0,
>> >> workingBuffer.prepareVarCharHolder(parser.getText()),
>> >> workingBuffer.getBuf());
>> >>    } catch (IllegalArgumentException e) {
>> >>      if (parser.getText() == null || parser.getText().isEmpty()) {
>> >>       // return;
>> >>      }
>> >>      throw e;
>> >>    }
>> >>  }
>> >
>> >
>> >
>> > Is this a possible fix to the mentioned bug?
>> > If yes should i pull request ?
>> >
>> > Thanks.
>>
>>
>

Re: About a "DRILL-5033" fix

Reply via email to