Re: Inferring Csv Schemas

2018-12-02 Thread Reza Rokni
Hi, No problem and warm welcome to your Beam journey :-) Yes, in this case when a issue was found the BigQuery API was used to make the changes. Cheers Reza On Sat, 1 Dec 2018 at 03:38, Joe Cullen wrote: > That's great Reza, thanks! I'm still getting to grips with Beam and > Dataflow so apolo

Re: Inferring Csv Schemas

2018-11-30 Thread Joe Cullen
That's great Reza, thanks! I'm still getting to grips with Beam and Dataflow so apologies for all the questions. I have a few more if that's ok: 1. When the article says "the schema would be mutated", does this mean the BigQuery schema? 2. Also, when the known good BigQuery schema is retrieved, an

Re: Inferring Csv Schemas

2018-11-30 Thread Reza Rokni
Hi Joe, That part of the blog should have been written a bit cleaner.. I blame the writer ;-) So while that solution worked it was inefficient, this is discussed in the next paragraph.. But essentially checking the validity of the schema every time is not efficient, especially as they are normally

Re: Inferring Csv Schemas

2018-11-30 Thread Joe Cullen
Thanks Reza, that's really helpful! I have a few questions: "He used a GroupByKey function on the JSON type and then a manual check on the JSON schema against the known good BigQuery schema. If there was a difference, the schema would mutate and the updates would be pushed through." If the diffe

Re: Inferring Csv Schemas

2018-11-30 Thread Reza Ardeshir Rokni
Hi Joe, You may find some of the info in this blog of interest, its based on streaming pipelines but useful ideas. https://cloud.google.com/blog/products/gcp/how-to-handle-mutating-json-schemas-in-a-streaming-pipeline-with-square-enix Cheers Reza On Thu, 29 Nov 2018 at 06:53, Joe Cullen wrote

Inferring Csv Schemas

2018-11-29 Thread Joe Cullen
Hi all, I have a pipeline reading CSV files, performing some transforms, and writing to BigQuery. At the moment I'm reading the BigQuery schema from a separate JSON file. If the CSV files had a new column added (and I wanted to include this column in the resultant BigQuery table), I'd have to chan