Hi Andreas,

if dataset.getType() returns a RowTypeInfo you can ignore this log message. The type extractor runs before the ".returns()" but with this method you override the old type.

Regards,
Timo


On 15.01.20 15:27, Hailu, Andreas wrote:
Dawid, this approach looks promising. I’m able to flatten out my Avro records into Rows and run simple queries atop of them. I’ve got a question – when I register my Rows as a table, I see the following log providing a warning:

/2020-01-14 17:16:43,083 [main] INFO  TypeExtractor - class org.apache.flink.types.Row does not contain a getter for field fields/

/2020-01-14 17:16:43,083 [main] INFO  TypeExtractor - class org.apache.flink.types.Row does not contain a setter for field fields/

/2020-01-14 17:16:43,084 [main] INFO  TypeExtractor - Class class org.apache.flink.types.Row cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance./

Will this be problematic even now that we’ve provided TypeInfos for the Rows? Performance is something that I’m concerned about as I’ve already introduced a new operation to transform our records to Rows.

*// *ah**

*From:* Hailu, Andreas [Engineering]
*Sent:* Wednesday, January 8, 2020 12:08 PM
*To:* 'Dawid Wysakowicz' <dwysakow...@apache.org>; user@flink.apache.org
*Cc:* Richards, Adam S [Engineering] <adam.richa...@ny.email.gs.com>
*Subject:* RE: Table API: Joining on Tables of Complex Types

Very well – I’ll give this a try. Thanks, Dawid.

*// *ah**

*From:* Dawid Wysakowicz <dwysakow...@apache.org <mailto:dwysakow...@apache.org>>
*Sent:* Wednesday, January 8, 2020 7:21 AM
*To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com <mailto:andreas.ha...@ny.email.gs.com>>; user@flink.apache.org <mailto:user@flink.apache.org> *Cc:* Richards, Adam S [Engineering] <adam.richa...@ny.email.gs.com <mailto:adam.richa...@ny.email.gs.com>>
*Subject:* Re: Table API: Joining on Tables of Complex Types

Hi Andreas,

Converting your GenericRecords to Rows would definitely be the safest option. You can check how its done in the org.apache.flink.formats.avro.AvroRowDeserializationSchema. You can reuse the logic from there to write something like:

     DataSet<GenericRecord> dataset = ...

     dataset.map( /* convert GenericRecord to Row 
*/).returns(AvroSchemaConverter.convertToTypeInfo(avroSchemaString));

Another thing you could try is to make sure that GenericRecord is seen as an avro type by fink (flink should understand that avro type is a complex type):

     dataset.returns(new GenericRecordAvroTypeInfo(/*schema string*/)

than the TableEnvironment should pick it up as a structured type and flatten it automatically when registering the Table. Bear in mind the returns method is part of SingleInputUdfOperator so you can apply it right after some transformation e.g. map/flatMap etc.

Best,

Dawid

On 06/01/2020 18:03, Hailu, Andreas wrote:

    Hi David, thanks for getting back.

     From what you’ve said, I think we’ll need to convert our
    GenericRecord into structured types – do you have any references or
    examples I can have a look at? If not, perhaps you could just show
    me a basic example of flattening a complex object with accessors
    into a Table of structured types. Or by structured types, did you
    mean Row?

    *// *ah

    *From:* Dawid Wysakowicz <dwysakow...@apache.org>
    <mailto:dwysakow...@apache.org>
    *Sent:* Monday, January 6, 2020 9:32 AM
    *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>
    <mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
    <mailto:user@flink.apache.org>
    *Cc:* Richards, Adam S [Engineering] <adam.richa...@ny.email.gs.com>
    <mailto:adam.richa...@ny.email.gs.com>
    *Subject:* Re: Table API: Joining on Tables of Complex Types

    Hi Andreas,

    First of all I would highly recommend converting a non-structured
    types to structured types as soon as possible as it opens more
    possibilities to optimize the plan.

    Have you tried:

    Table users =
    batchTableEnvironment.fromDataSet(usersDataset).select("getField(f0,
    userName) as userName", "f0")
    Table other =
    batchTableEnvironment.fromDataSet(otherDataset).select("getField(f0,
    userName) as user", "f1")

    Table result = other.join(users, "user = userName")

    You could also check how the
    org.apache.flink.formats.avro.AvroRowDeserializationSchema class is
    implemented which internally converts an avro record to a structured
    Row.

    Hope this helps.

    Best,

    Dawid

    On 03/01/2020 23:16, Hailu, Andreas wrote:

        Hi folks,

        I’m trying to join two Tables which are composed of complex
        types, Avro’s GenericRecord to be exact. I have to use a custom
        UDF to extract fields out of the record and I’m having some
        trouble on how to do joins on them as I need to call this UDF to
        read what I need. Example below:

        batchTableEnvironment.registerFunction("getField", new
        GRFieldExtractor()); // GenericRecord field extractor

        Table users = batchTableEnvironment.fromDataSet(usersDataset);
        // Converting from some pre-existing DataSet

        Table otherDataset =
        batchTableEnvironment.fromDataSet(someOtherDataset);

        Table userNames = t.select("getField(f0, userName)"); // This is
        how the UDF is used, as GenericRecord is a complex type
        requiring you to invoke a get() method on the field you’re
        interested in. Here we get a get on field ‘userName’

        I’d like to do something using the Table API similar to the
        query “SELECT * from otherDataset WHERE otherDataset.userName =
        users.userName”. How is this done?

        Best,

        Andreas

        *The Goldman Sachs Group, Inc. All rights reserved*.

        See http://www.gs.com/disclaimer/global_email for important risk
        disclosures, conflicts of interest and other terms and
        conditions relating to this e-mail and your reliance on
        information contained in it.  This message may contain
        confidential or privileged information.  If you are not the
        intended recipient, please advise us immediately and delete this
        message.  See http://www.gs.com/disclaimer/email for further
        information on confidentiality and the risks of non-secure
        electronic communication.  If you cannot access these links,
        please notify us by reply message and we will send the contents
        to you.

        ------------------------------------------------------------------------


        Your Personal Data: We may collect and process information about
        you that may be subject to data protection laws. For more
        information about how we use and disclose your personal data,
        how we protect your information, our legal basis to use your
        information, your rights and who you can contact, please refer
        to: www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>

    ------------------------------------------------------------------------


    Your Personal Data: We may collect and process information about you
    that may be subject to data protection laws. For more information
    about how we use and disclose your personal data, how we protect
    your information, our legal basis to use your information, your
    rights and who you can contact, please refer to:
    www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>


------------------------------------------------------------------------

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>

Reply via email to