@John,

I agree with what you suggested.  The behavior on Postgres makes more sense.

The issue is in Apache Calcite, the sql planner Drill that uses. I
post this question on Calcite dev list. The fix would be in Calcite
code.



On Tue, May 24, 2016 at 8:01 AM, Zelaine Fong <zf...@maprtech.com> wrote:
> I agree that this is a bug, as you've already noted via DRILL-4692.
> Putting backticks around USER should treat it as an identifier, not a
> reserved word or special function.  Based on Jinfeng's findings, it looks
> like Drill puts special function names in the same namespace as
> identifiers, resulting in the current behavior.
>
> -- Zelaine
>
> On Tue, May 24, 2016 at 5:14 AM, John Omernik <j...@omernik.com> wrote:
>
>> I think if there is any change that one of the interpretations could be to
>> allow `user` (backtick user backtick) that we should do that... Drill uses
>> backticks as the quoted identifier (I gather Postgres uses double quotes as
>> it's identifier) having user be the column name will same many
>> organizations from errorless "wrong" results... I just worry about the down
>> stream there... I'd probably feel less strongly about this if A. the word
>> was less likely to be a column name (like current_drill_user vs user) and
>> B. I wasn't a user with a background in security and see all the data
>> sources that use "user" as a column.   I just have this sinking feeling it
>> will lead to bad things in data analysis for many users.
>>
>> I guess it comes down to the principle of least surprise, I believe that in
>> this case, `user` providing the the column name user is less surprising
>> then `user` providing the current drill user, and if we can make it align
>> with out SQL systems (like how Postgres behaves with ITS quoted identifier)
>> than I think we have a compelling case for changing how drill reacts here.
>>
>> Thoughts?
>>
>> (Once again, thanks for continued follow-up here, I love talking about
>> issues like these)
>>
>> John
>>
>>
>>
>> On Mon, May 23, 2016 at 10:39 PM, Jinfeng Ni <jinfengn...@gmail.com>
>> wrote:
>>
>> > mydb=# select  "user" from t1;
>> >  user
>> > ------
>> >  ABC
>> >
>> > I should take back what I said. With quoted identifier, Postgres
>> > behaved different from Drill. Both of the interpretations seem to be
>> > reasonable, since the identifier could represent two different things.
>> >
>> >
>> > On Mon, May 23, 2016 at 7:41 PM, Zelaine Fong <zf...@maprtech.com>
>> wrote:
>> > > Jinfeng,
>> > >
>> > > What does postgres return for the following query in your example?
>> > >
>> > > select "user" from t1;
>> > >
>> > > -- Zelaine
>> > >
>> > > On Mon, May 23, 2016 at 7:39 PM, John Omernik <j...@omernik.com>
>> wrote:
>> > >
>> > >> Hmm, you are correct, I don't have to like it :) but there is both
>> logic
>> > >> and precedence here.  Thanks for following up
>> > >>
>> > >> John
>> > >>
>> > >> On Monday, May 23, 2016, Jinfeng Ni <jinfengn...@gmail.com> wrote:
>> > >>
>> > >> > An quoted identifier is still an identifier (Drill uses back tick as
>> > >> > quote). Per SQL standard,  identifier CURRENT_USER / USER/
>> > >> > CURRENT_SESSION/etc are implicit function calls; no () is required.
>> > >> >
>> > >> > I checked Postgre, and seems it has the same behavior.
>> > >> >
>> > >> > mydb=# create table t1 (id int, "user" varchar(10));
>> > >> >
>> > >> > mydb=# insert into t1 values(100, 'ABC');
>> > >> > INSERT 0 1
>> > >> >
>> > >> > mydb=# select * from t1;
>> > >> >  id  | user
>> > >> > -----+------
>> > >> >  100 | ABC
>> > >> > (1 row)
>> > >> >
>> > >> > mydb=# select user from t1;
>> > >> >  current_user
>> > >> > --------------
>> > >> >  postgres
>> > >> > (1 row)
>> > >> >
>> > >> > mydb=# select t1.user from t1;
>> > >> >  user
>> > >> > ------
>> > >> >  ABC
>> > >> > (1 row)
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Mon, May 23, 2016 at 5:12 PM, John Omernik <j...@omernik.com
>> > >> > <javascript:;>> wrote:
>> > >> > > Can (should) things inside back ticks be callable? I guess this
>> > makes a
>> > >> > > very difficult situation from a usability standpoint because user
>> > is a
>> > >> > not
>> > >> > > uncommon column name (think security logs, web logs, etc) yet in
>> the
>> > >> > > current setup there is lots of possibility for assumptions on
>> > calling
>> > >> > back
>> > >> > > tick user back tick and without an error users may have wrong, but
>> > >> > "error"
>> > >> > > free results.
>> > >> > > On May 23, 2016 4:54 PM, "Jinfeng Ni" <jinfengn...@gmail.com
>> > >> > <javascript:;>> wrote:
>> > >> > >
>> > >> > >> The problem here is that identifier 'user' is not only a reserved
>> > >> > >> word, but also represents a special function ==  current_user()
>> > call.
>> > >> > >> The identifier 'user', whether it's quoted or not, could mean
>> > either
>> > >> > >> column name or the function call.  Without the table alias, it
>> > could
>> > >> > >> be ambiguous to sql parser. The table alias informs the parser
>> that
>> > >> > >> this identifier is not a function call, but a regular identifier,
>> > thus
>> > >> > >> removes the ambiguity.
>> > >> > >>
>> > >> > >> This is different from other cases you use quoted reserved word
>> to
>> > >> > >> represent a column name, since those reserved words do not
>> > represent a
>> > >> > >> special function, thus no ambiguity.
>> > >> > >>
>> > >> > >> select `update`, `insert` from dfs.tmp.`1.json`;
>> > >> > >> +---------+---------+
>> > >> > >> | update  | insert  |
>> > >> > >> +---------+---------+
>> > >> > >> | abc     | 100     |
>> > >> > >> +---------+---------+
>> > >> > >>
>> > >> > >>
>> > >> > >>
>> > >> > >> On Mon, May 23, 2016 at 10:44 AM, John Omernik <j...@omernik.com
>> > >> > <javascript:;>> wrote:
>> > >> > >> > Ya, as I am testing, this works, however, the users of the
>> system
>> > >> > expect
>> > >> > >> to
>> > >> > >> > be able to use `user` and while I can provide them instructions
>> > to
>> > >> > use a
>> > >> > >> > table alias, I am very worried that they will forget and since
>> it
>> > >> > doesn't
>> > >> > >> > error, but instead puts in a different string, this could lead
>> to
>> > >> bad
>> > >> > >> > downstream results...
>> > >> > >> >
>> > >> > >> >
>> > >> > >> >
>> > >> > >> >
>> > >> > >> > On Mon, May 23, 2016 at 12:41 PM, John Omernik <
>> j...@omernik.com
>> > >> > <javascript:;>> wrote:
>> > >> > >> >
>> > >> > >> >> I filed https://issues.apache.org/jira/browse/DRILL-4692
>> > >> > >> >>
>> > >> > >> >> I see an alias would work as a tmp fix, but this should be
>> > address
>> > >> (I
>> > >> > >> >> wonder if other words may have a problem too?)
>> > >> > >> >>
>> > >> > >> >>
>> > >> > >> >>
>> > >> > >> >> On Mon, May 23, 2016 at 12:38 PM, Andries Engelbrecht <
>> > >> > >> >> aengelbre...@maprtech.com <javascript:;>> wrote:
>> > >> > >> >>
>> > >> > >> >>> Hmm interesting.
>> > >> > >> >>>
>> > >> > >> >>> As a workaround just use a table alias when referencing the
>> > >> column.
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> Might be good to se if there is a JIRA for this, or file one
>> if
>> > >> not.
>> > >> > >> >>>
>> > >> > >> >>> --Andries
>> > >> > >> >>>
>> > >> > >> >>> > On May 23, 2016, at 10:28 AM, John Omernik <
>> j...@omernik.com
>> > >> > <javascript:;>> wrote:
>> > >> > >> >>> >
>> > >> > >> >>> > I have data with a field name user.
>> > >> > >> >>> >
>> > >> > >> >>> > When I select, with backticks, it doesn't show the field,
>> but
>> > >> > >> instead my
>> > >> > >> >>> > current logged in user...
>> > >> > >> >>> >
>> > >> > >> >>> >
>> > >> > >> >>> > select CONVERT_FROM(`user`, 'UTF8') as `user` from table
>> > limit
>> > >> 10;
>> > >> > >> >>> >
>> > >> > >> >>> >
>> > >> > >> >>> > Shouldn't the backticks allow me to reference the field
>> > >> properly?
>> > >> > >> >>> >
>> > >> > >> >>> > John
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>
>> > >> > >>
>> > >> >
>> > >>
>> > >>
>> > >> --
>> > >> Sent from my iThing
>> > >>
>> >
>>

Reply via email to