Re: Beam shell sql with zeta

Wiśniowski Piotr Thu, 20 Apr 2023 12:40:28 -0700

Thank You for the reply and a hint.

1. Yes did try with Calcite `ROW` too - `java.lang.NoSuchFieldException:head (state=,code=0)` but on the transformation side `SELECT * FROMetl_raw LIMIT 1`. Maybe I need to directly refer to a field that I needinstead of using `*`? Do You know from top of your head whats the syntaxto get `repo_state.head.commit` value? Did try also to add more fieldsto the table def but no luck.


2.

> doesn't actually do anything on the SQL shell at query parse time

This is also my observation. What is the proper way to initialize Zeta?Did try `./bin/shell--plannerName=org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`and few other variations but seems like sqlline does not allow forpassing parameters to pipeline options.

I do not have any requirement specific for Zeta but it seems it has abit more support for JSON functions than Calcite.


3. Any other idea to workaround this issue?

Thanks for support!

For context I do try to setup some POC with this SQLs as it could reallyspeed up development and maintenance of our pipelines + SQL has lowrequirements regarding knowledge to actually understand the logic (readthe code). If this works it might be a great (killer) feature for Beam.


Best

Wisniowski Piotr



On 20.04.2023 20:52, Andrew Pilloud via user wrote:

set plannerName doesn't actually do anything on the SQL shell at queryparse time, it will still use the calcite parser. Have you triedcalcite SQL?

Support for struts is somewhat limited. I know there are bugs aroundnested structs and structs with single values.


Andrew

On Thu, Apr 20, 2023 at 9:26 AM Wiśniowski Piotr<contact.wisniowskipi...@gmail.com> wrote:


    Hi,

    I have a question regarding usage of Zeta with SQL extensions in SQL
    shell. I try to:

    ```

    SET runner = DirectRunner;
    SET tempLocation = `/tmp/test/`;
    SET streaming=`True`;
    SET plannerName =
    `org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`;

    CREATE EXTERNAL TABLE etl_raw(
         event_timestamp TIMESTAMP,
         event_type VARCHAR,
         message_id VARCHAR,
         tracking_source VARCHAR,
         tracking_version VARCHAR,
         `repo_state` STRUCT<`head` STRUCT<`commit` VARCHAR ,`name`
    VARCHAR>>
    )
    TYPE pubsub
    LOCATION 'projects/xxx/topics/xxx'
    TBLPROPERTIES '{"format":"json"}';

    ```

    But get error `parse failed: Encountered "STRUCT" `.

    If i change the `STRUCT` to `ROW` (as in Calcite) the DDL passes, but
    still I do fail to receive data on

    `SELECT * FROM etl_raw LIMIT 1;` with exception of
    `java.lang.NoSuchFieldException: head (state=,code=0)` when I am sure
    that the field is there in json payload.

    With commented out `repo_state` filed I am able to retrieve the data.
    Unfortunately I do not have control over the payload structure as its
    3rd party hook to make it flat.

    In general I am unable to parse json msg from pubsub having
    structured
    field.

    Is anyone familiar with this part of Beam functionalities?

    Best regards

    Wisniowski Piotr

Re: Beam shell sql with zeta

Reply via email to