Thank You for the reply and a hint.

1. Yes did try with Calcite `ROW` too - `java.lang.NoSuchFieldException: head (state=,code=0)` but on the transformation side `SELECT * FROM etl_raw LIMIT 1`. Maybe I need to directly refer to a field that I need instead of using `*`? Do You know from top of your head whats the syntax to get `repo_state.head.commit` value? Did try also to add more fields to the table def but no luck.

2.

> doesn't actually do anything on the SQL shell at query parse time

This is also my observation. What is the proper way to initialize Zeta? Did try `./bin/shell --plannerName=org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner` and few other variations but seems like sqlline does not allow for passing parameters to pipeline options.

I do not have any requirement specific for Zeta but it seems it has a bit more support for JSON functions than Calcite.

3. Any other idea to workaround this issue?

Thanks for support!

For context I do try to setup some POC with this SQLs as it could really speed up development and maintenance of our pipelines + SQL has low requirements regarding knowledge to actually understand the logic (read the code). If this works it might be a great (killer) feature for Beam.

Best

Wisniowski Piotr



On 20.04.2023 20:52, Andrew Pilloud via user wrote:

set plannerName doesn't actually do anything on the SQL shell at query parse time, it will still use the calcite parser. Have you tried calcite SQL?

Support for struts is somewhat limited. I know there are bugs around nested structs and structs with single values.

Andrew

On Thu, Apr 20, 2023 at 9:26 AM Wiśniowski Piotr <contact.wisniowskipi...@gmail.com> wrote:

    Hi,

    I have a question regarding usage of Zeta with SQL extensions in SQL
    shell. I try to:

    ```

    SET runner = DirectRunner;
    SET tempLocation = `/tmp/test/`;
    SET streaming=`True`;
    SET plannerName =
    `org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`;

    CREATE EXTERNAL TABLE etl_raw(
         event_timestamp TIMESTAMP,
         event_type VARCHAR,
         message_id VARCHAR,
         tracking_source VARCHAR,
         tracking_version VARCHAR,
         `repo_state` STRUCT<`head` STRUCT<`commit` VARCHAR ,`name`
    VARCHAR>>
    )
    TYPE pubsub
    LOCATION 'projects/xxx/topics/xxx'
    TBLPROPERTIES '{"format":"json"}';

    ```

    But get error `parse failed: Encountered "STRUCT" `.

    If i change the `STRUCT` to `ROW` (as in Calcite) the DDL passes, but
    still I do fail to receive data on

    `SELECT * FROM etl_raw LIMIT 1;` with exception of
    `java.lang.NoSuchFieldException: head (state=,code=0)` when I am sure
    that the field is there in json payload.

    With commented out `repo_state` filed I am able to retrieve the data.
    Unfortunately I do not have control over the payload structure as its
    3rd party hook to make it flat.

    In general I am unable to parse json msg from pubsub having
    structured
    field.

    Is anyone familiar with this part of Beam functionalities?

    Best regards

    Wisniowski Piotr


Reply via email to