[ 
https://issues.apache.org/jira/browse/CALCITE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804451#comment-17804451
 ] 

Egor Malko commented on CALCITE-6176:
-------------------------------------

I have done small investigation. It seems like 
{{org.apache.calcite.rel.rules.SubQueryRemoveRule.rewriteExists()}} 
method is a problem. It was added in CALCITE-2329 with intention to optimize 
cases like 
{code:sql}
select sal from emp where EXISTS (select deptno from dept where deptno=2)
{code}
Only optimization in {{rewriteExists()}} expects right table not to be used 
after the join and changes it freely, but it's not true in the current issue 
cases.

Also 
[CALCITE-2329|https://github.com/apache/calcite/commit/1ae6a52603006741a4cb6bf558c58d9ac08e866a#diff-a57af2470b3215be0ce7d94a226f1997d9417c613fd7f32097fd07543b30633a]
 [ 
PR|https://github.com/apache/calcite/commit/1ae6a52603006741a4cb6bf558c58d9ac08e866a#diff-a57af2470b3215be0ce7d94a226f1997d9417c613fd7f32097fd07543b30633a]
 contains suspicious changes. Before PR, EXISTS cases were processed with IN 
ones, and after they started to be processed in the own {{rewriteExists()}} 
method with only one optimization from the issue. So are we possibly got rid 
from old optimizations?

Please, validate my thoughts, because I'm new to the project

> 'exists' in 'join on' has a wrong result with 
> CoreRules.JOIN_SUB_QUERY_TO_CORRELATE
> -----------------------------------------------------------------------------------
>
>                 Key: CALCITE-6176
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6176
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.36.0
>            Reporter: SimonAlexs
>            Priority: Critical
>
> In calcite 1.36.0, using flowing rules:
>         CoreRules.JOIN_SUB_QUERY_TO_CORRELATE
>         CoreRules.JOIN_CONDITION_PUSH
> I tested some wrong cases(the expected result and actual result are in 
> comment):
>  * expected result: the result in mysql;
>  * actual result: the result in calcite1.36.0
> {code:java}
> -- not exists, expected: 1, 2 
> --                        actual:   1, null
> select *
> from (select 1 id) t1
> left join (select 2 id) t2
>       on not exists(select *
>                     from (select 3 id) p
>                     where p.id=t2.id)
> -- or exists, expected: 1, null
> --                       actual:   1, 2
> select *
> from (select 1 id) t1
> left join (select 2 id) t2
>       on t1.id=t2.id or exists(select *
>                                from (select 3 id) p
>                                where p.id=t2.id)
> -- or not exists, expected: 1, 2 
> --                           actual:   1, null
> select *
> from (select 1 id) t1
> left join (select 2 id) t2
>       on t1.id=t2.id or not exists(select *
>                                    from (select 3 id) p
>                                    where p.id=t2.id)
> -- and not exists, expected: 1, null
> --                           2, 2
> --                   actual: 1, null
> --                           2, null
> select *
> from (select 1 id
>       union all
>       select 2) t1
> left join (select 2 id) t2
>       on t1.id=t2.id and not exists(select *
>                                 from (select 3 id) p
>                                 where p.id=t1.id){code}
> This seems a bug.
> The rel of case 2 is as below. The reason I guess is that, The '$1' in 'IS 
> NOT NULL($1)' of line 3 may be wrong.Because In the rel, the $1 represents 
> the first column of the right table, which is usually not null depends on 
> right input data. However, it should represent the column name 'i' in right 
> table which means the 'exists' result.
> This is only my guessing.The real reason for this is needed to find by users 
> who really understand Calcite.
> LogicalProject(id=[$0], id0=[$1])
>   LogicalProject(id=[$0], id0=[$1])
>     LogicalJoin(condition=[OR(=($0, $1), IS NOT NULL($1))], joinType=[left])
>       LogicalValues(tuples=[[\{1}]])
>       LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[\{1}])
>         LogicalValues(tuples=[[\{2}]])
>         LogicalProject(i=[true])
>           LogicalFilter(condition=[=($0, $cor0.id0)])
>             LogicalValues(tuples=[[\{3}]])
> My whole code is:
> {code:java}
>  
> Properties properties = new Properties();
> properties.put(CalciteConnectionProperty.MODEL.camelName(), "inline:");
> Connection calciteConnection = DriverManager.getConnection("jdbc:calcite:", 
> properties);
> Statement calciteStatement = calciteConnection.createStatement();
> CalcitePrepare.Context prepareContext = 
> calciteStatement.unwrap(CalciteServerStatement.class).createPrepareContext();
> CalciteSchema pSpaceSchema = 
> prepareContext.getRootSchema().getSubSchema("ps", false);
> final FrameworkConfig config = Frameworks.newConfigBuilder()
>                                .parserConfig(SqlParser.config()
>                                .withLex(Lex.MYSQL)
>                                .withConformance(SqlConformanceEnum.MYSQL_5))
>                                
> .sqlValidatorConfig(SqlValidator.Config.DEFAULT)
>                                .defaultSchema(pSpaceSchema.plus())
>                                .build();
> Planner planner = Frameworks.getPlanner(config);
> SqlNode parsedSql = planner.parse(sql);
> SqlNode validatedSql = planner.validate(parsedSql);
> RelRoot relRoot = planner.rel(validatedSql);
> RelNode originRel = relRoot.rel;
> HepProgram program = HepProgram.builder()
>                     .addRuleCollection(Arrays.asList(
>                        CoreRules.JOIN_SUB_QUERY_TO_CORRELATE
>                        , CoreRules.JOIN_CONDITION_PUSH
>                      ))
>                      .build();
> HepPlanner hepPlanner = new HepPlanner(program);
> hepPlanner.setRoot(originRel);
> RelNode optimizedRel = hepPlanner.findBestExp(); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to