[email protected] <[email protected]>
I tried reproducing this but was not successful, the output schema was as
expected. I added the following to BeamSqlMultipleSchemasTest.java at head.
(I did discover that PAssert.that(result).containsInAnyOrder(output)
doesn't validate column names however.)
@Test
public void testSelectAs() {
PCollection<Row> input = pipeline.apply(create(row(1, "strstr")));
PCollection<Row> result =
input.apply(SqlTransform.query("WITH tempTable (id, v) AS (SELECT
f_int as id, f_string as v FROM PCOLLECTION) SELECT id AS fout_int, v AS
fout_string FROM tempTable"));
Schema output_schema =
Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build();
assertThat(result.getSchema(), equalTo(output_schema));
Row output = Row.withSchema(output_schema).addValues(1,
"strstr").build();
PAssert.that(result).containsInAnyOrder(output);
pipeline.run();
}
On Tue, Jan 24, 2023 at 8:13 AM Talat Uyarer <[email protected]>
wrote:
> Hi Kenn,
>
> Thank you for replying back to my email.
>
> I was under the same impression about Calcite. But I wrote a test on
> Calcite 1.28 too. It is working without issue that I see on BEAM
>
> Here is my test case. If you want you can also run on Calcite. Please put
> under core/src/test/resources/sql as text file. and Run CoreQuidemTest
> class.
>
> !use post
> !set outputformat mysql
>
> #Test aliases with with clause
> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id,
> "hr"."emps"."name" as v from "hr"."emps")
> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE
> tempTable.v <> '11' ;
> +-----+-----------+
> | ID | value |
> +-----+-----------+
> | 100 | Bill |
> | 110 | Theodore |
> | 150 | Sebastian |
> | 200 | Eric |
> +-----+-----------+
> (4 rows)
>
> !ok
>
>
> On Mon, Jan 23, 2023 at 10:16 AM Kenneth Knowles <[email protected]> wrote:
>
>> Looking at the code that turns a logical CalcRel into a BeamCalcRel I do
>> not see any obvious cause for this:
>> https://github.com/apache/beam/blob/b3aa2e89489898f8c760294ba4dba2310ac53e70/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java#L69
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_b3aa2e89489898f8c760294ba4dba2310ac53e70_sdks_java_extensions_sql_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rule_BeamCalcRule.java-23L69&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KXc2qSceL6qFbFnQ_2qUOHr9mKuc6zYY8rJTNZC8p_wTcNs4M6mHQoCuoc4JfeaA&s=KjzplEf29oFB6uivvdjixpQiArWtfV-1SXpALL-ugEM&e=>
>>
>> I don't like to guess that upstream libraries have the bug, but in this
>> case I wonder if the alias is lost in the Calcite optimizer rule for
>> merging the projects and filters into a Calc.
>>
>> Kenn
>>
>> On Mon, Jan 23, 2023 at 10:13 AM Kenneth Knowles <[email protected]> wrote:
>>
>>> I am not sure I understand the question, but I do see an issue.
>>>
>>> Context: "CalcRel" is an optimized relational operation that is somewhat
>>> like ParDo, with a small snippet of a single-assignment DSL embedded in it.
>>> Calcite will choose to merge all the projects and filters into the node,
>>> and then generates Java bytecode to directly execute the DSL.
>>>
>>> Problem: it looks like the CalcRel has output columns with aliases "id"
>>> and "v" where it should have output columns with aliases "id" and "value".
>>>
>>> Kenn
>>>
>>> On Thu, Jan 19, 2023 at 6:01 PM Ahmet Altay <[email protected]> wrote:
>>>
>>>> Adding: @Andrew Pilloud <[email protected]> @Kenneth Knowles
>>>> <[email protected]>
>>>>
>>>> On Thu, Jan 12, 2023 at 12:31 PM Talat Uyarer via user <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am using Beam 2.43 with Calcite SQL with Java.
>>>>>
>>>>> I have a query with a WITH clause and some aliasing. Looks like Beam
>>>>> Query optimizer after optimizing my query, it drops Select statement's
>>>>> aliases. Can you help me to identify where the problem is ?
>>>>>
>>>>> This is my query
>>>>> INFO: SQL:
>>>>> WITH `tempTable` (`id`, `v`) AS (SELECT
>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedInt` AS `id`,
>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedString` AS `v`
>>>>> FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`) (SELECT `tempTable`.`id`
>>>>> AS `id`, `tempTable`.`v` AS `value`
>>>>> FROM `tempTable` AS `tempTable`
>>>>> WHERE `tempTable`.`v` <> '11')
>>>>>
>>>>> This is Calcite Plan look at LogicalProject(id=[$0], value=[$1]) in
>>>>> SQL plan.
>>>>>
>>>>> Jan 12, 2023 12:19:08 PM
>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner
>>>>> convertToBeamRel
>>>>> INFO: SQLPlan>
>>>>> LogicalProject(id=[$0], value=[$1])
>>>>> LogicalFilter(condition=[<>($1, '11')])
>>>>> LogicalProject(id=[$1.f_nestedInt], v=[$1.f_nestedString])
>>>>> BeamIOSourceRel(table=[[beam, PCOLLECTION]])
>>>>>
>>>>> But Beam Plan does not have a LogicalProject(id=[$0], value=[$1]) or
>>>>> similar.
>>>>>
>>>>> Jan 12, 2023 12:19:08 PM
>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner
>>>>> convertToBeamRel
>>>>> INFO: BEAMPlan>
>>>>> BeamCalcRel(expr#0..1=[{inputs}], expr#2=[$t1.f_nestedInt],
>>>>> expr#3=[$t1.f_nestedString], expr#4=['11':VARCHAR], expr#5=[<>($t3, $t4)],
>>>>> id=[$t2], v=[$t3], $condition=[$t5])
>>>>> BeamIOSourceRel(table=[[beam, PCOLLECTION]])
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>