Hi,
No problem and warm welcome to your Beam journey :-)
Yes, in this case when a issue was found the BigQuery API was used to make
the changes.
Cheers
Reza
On Sat, 1 Dec 2018 at 03:38, Joe Cullen wrote:
> That's great Reza, thanks! I'm still getting to grips with Beam and
> Dataflow so apolo
That's great Reza, thanks! I'm still getting to grips with Beam and
Dataflow so apologies for all the questions. I have a few more if that's ok:
1. When the article says "the schema would be mutated", does this mean the
BigQuery schema?
2. Also, when the known good BigQuery schema is retrieved, an
Hi Joe,
That part of the blog should have been written a bit cleaner.. I blame the
writer ;-) So while that solution worked it was inefficient, this is
discussed in the next paragraph.. But essentially checking the validity of
the schema every time is not efficient, especially as they are normally
Thanks Reza, that's really helpful!
I have a few questions:
"He used a GroupByKey function on the JSON type and then a manual check on
the JSON schema against the known good BigQuery schema. If there was a
difference, the schema would mutate and the updates would be pushed
through."
If the diffe
Hi Joe,
You may find some of the info in this blog of interest, its based on
streaming pipelines but useful ideas.
https://cloud.google.com/blog/products/gcp/how-to-handle-mutating-json-schemas-in-a-streaming-pipeline-with-square-enix
Cheers
Reza
On Thu, 29 Nov 2018 at 06:53, Joe Cullen wrote
Hi all,
I have a pipeline reading CSV files, performing some transforms, and
writing to BigQuery. At the moment I'm reading the BigQuery schema from a
separate JSON file. If the CSV files had a new column added (and I wanted
to include this column in the resultant BigQuery table), I'd have to chan