Hi Preston,
sharing the Google Sheets is not enough (already tested), because the Dataflow 
service account is only authenticated on GCP, not Drive; moreover I am using 
Python SDK, not the Scala wrapper, to develop Beam pipelines.
Leonardo

Da: Preston Marshall <[email protected]>
Inviato: mercoledì 6 giugno 2018 21:21
A: [email protected]
Oggetto: Re: Read from a Google Sheet based BigQuery table - Python SDK

Not sure if this is helpful but you can also share Google Sheets with service 
accounts directly. I am solving a similar problem by using the Google SDK 
directly to pull the data from the sheet, then feeding it into Beam via Scio's 
parallelize functionality. My dataset is small so this worked for me.

On Wed, Jun 6, 2018 at 1:13 PM Chamikara Jayalath 
<[email protected]<mailto:[email protected]>> wrote:

On Tue, Jun 5, 2018 at 9:56 PM Leonardo Biagioli 
<[email protected]<mailto:[email protected]>> wrote:
Hi Cham,
thanks but those pages are related to the authentication inside Google Cloud 
Platform services, I need to authenticate the job on Sheets… Since that the 
required scope is https://www.googleapis.com/auth/drive is there a way to pass 
it in the deployment phase of a Dataflow job?

I haven't tried this unfortunately so not sure if this will work. Are you able 
to run queries against your federated table using BQ dashboard (without using 
Dataflow) ? Also make sure that compute engine service account used by Dataflow 
job is properly authenticated (as mentioned in the document I provided). I 
recommend contacting Google cloud support for questions regarding BQ and 
Dataflow services.

- Cham

Thank you,
Leonardo

Da: Chamikara Jayalath <[email protected]<mailto:[email protected]>>
Inviato: martedì 5 giugno 2018 19:26
A: [email protected]<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Oggetto: Re: Read from a Google Sheet based BigQuery table - Python SDK

See following regarding authenticating Dataflow jobs.
https://cloud.google.com/dataflow/security-and-permissions

I'm not sure about information specific to sheets, seems like there's some info 
in following.
https://cloud.google.com/bigquery/external-data-drive

On Tue, Jun 5, 2018 at 10:16 AM Leonardo Biagioli 
<[email protected]<mailto:[email protected]>> wrote:
Hi Cham,
Thank you for taking time to answer!
Is there a way to authenticate properly a Beam job on Dataflow runner? I should 
specify the required scope to read from Sheets, but where I can set that 
parameter?
Regards,
Leonardo

Il 05 giu 2018 18:28, Chamikara Jayalath 
<[email protected]<mailto:[email protected]>> ha scritto:
I don't think BQ federated tables support export jobs so reading directly from 
such tables likely will not work. But reading using a query should work if your 
job is authenticated properly  (I haven't tested this).

- Cham

On Tue, Jun 5, 2018, 5:56 AM Leonardo Biagioli 
<[email protected]<mailto:[email protected]>> wrote:

Hi guys,

just wanted to ask you if there is a chance to read from a Sheet based BigQuery 
table from a Beam pipeline running on Dataflow…

I usually specify additional scopes to use through the authentication when 
running simple Python code to do the same, but I wasn’t able to find a 
reference to something similar for Beam.

Could you please help?

Thank you very much!

Leonardo



--


Preston Marshall

Director, Data Engineering


[www.cityblock.com]<http://www.cityblock.com/>


256-434-1050

[email protected]<mailto:[email protected]>

55 Washington St, Unit 552 Brooklyn, NY 1​1201


Reply via email to