Connecting To MSSQL Server With Apache Beam, Python SDK

Dennis Fri, 17 Jul 2020 09:22:14 -0700

Hello,

I'm writing in order to inquire about developing a pipeline (using the Python SDK) with multiple PTransforms that can read from, write to, and alter data from an MSSQL server.

I've been using beam-nuggets (https://pypi.org/project/beam-nuggets/), a community I/O Connector for dealing with these kinds of PTransforms for a MySQL server, and was looking to see if there's an option to do this for MSSQL.

So far, I've been able to run a pipeline with DirectRunner that reads data from MSSQL using pyodbc. While this is a good starting point, it's not running with DataflowRunner (even after configuring Private IP), and it's not parallelized.

I tried to look into SQLAlchemy, but it seems that there isn't as much support as there is for MySQL, especially for the insertion method. It is expected that the default insertion method is upsert. For MySQL, this was implemented using:

from sqlalchemy.dialects.mysql import insert as mysql_insert

There is not such a package available for MSSQL...

How would one go about doing this? I've looked at several stack overflow articles, but there wasn't any solution there that had any similar functionality to that of beam-nuggets. Perhaps I missed a solution?

I realize that this is a loaded question, so I greatly appreciate any help in advance.

Thanks,

Dennis

P.S. I had trouble adding my work email address, dzvigel...@questrade.com to the mailing list (even though I went through the same steps to subscribe as with this one), could you please add it? Thanks.

Connecting To MSSQL Server With Apache Beam, Python SDK

Reply via email to