Re: TableRow class is not the same after serialization

Kirill Zhdanovich Wed, 08 Jul 2020 13:44:13 -0700

Interesting. All my code does is following:

public static void main(String[] args) {
    PCollection<TableRow> bqResult =
p.apply(BigQueryIO.readTableRows().fromQuery(query).usingStandardSql());
    PCollection<SomeClass> result = runJob(bqResult, boolean and string
params);
    // store results
}


and

private static PCollection<SomeClass> runJob(PCollection<TableRow> bqResult,

...) {
        return bqResult
                // In this step I convert TableRow into my custom class
object
                .apply("Create metrics based on sessions",
                        ParDo.of(new CreateSessionMetrics(boolean and
string params)))
               // few more transformations

}

This is basically similar to examples you can find here
https://beam.apache.org/documentation/io/built-in/google-bigquery/

On Wed, 8 Jul 2020 at 23:31, Jeff Klukas <jklu...@mozilla.com> wrote:

> On Wed, Jul 8, 2020 at 3:54 PM Kirill Zhdanovich <kzhdanov...@gmail.com>
> wrote:
>
>> So from what I understand, it works like this by design and it's not
>> possible to test my code with the current coder implementation. Is that
>> correct?
>>
>
> I would argue that this test failure is indicating an area of potential
> failure in your code that should be addressed. It may be that your current
> production pipeline relies on fusion which is not guaranteed by the Beam
> model, and so the pipeline could fail if the runner makes an internal
> change that affect fusion (in practice this is unlikely).
>
> Is it possible to update your code such that it does not need to make
> assumptions about the concrete Map type returned by TableRow objects?
>


-- 
Best Regards,
Kirill

Re: TableRow class is not the same after serialization

Reply via email to