Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-04-10 Thread Dariusz Aniszewski
Hello It's been a while since my last activity on beam dev-list ;) Happy to be back! Few days ago Kasia created a JIRA issue for adding SnowflakeIO: https://issues.apache.org/jira/browse/BEAM-9722 Today, I'm happy to share the first PR with you with SnowflakeIO.Read: https://github.com/apache/be

Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-26 Thread Katarzyna Kucharczyk
Hi, Thank you for your enthusiasm and for so many questions/comments :) I hope to address them all. Alexey, as far as I know, copy methods have better performance than inserts/selects. I think currently in Beam's JDBC loading and unloading is provided by selects and inserts as well. But I saw copy

Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-24 Thread Ismaël Mejía
Forgot to mention that one particularly pesky issue we found in the work on Redshift is to be able to write unit tests on this. Is there an embedded version of SnowFlake to run those. I would like also if possible to get some ideas on how to test this use case. Also we should probably ensure that

Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-24 Thread Ismaël Mejía
Great ! It seems this pattern (COPY + parallel file read) is becoming a standard for 'data warehouses' we are using something similar too in the AWS Redshift PR (WIP) for details: https://github.com/apache/beam/pull/10206 Maybe worth for all of us to check and se eif we can converge the implementa

Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-24 Thread Elias Djurfeldt
Awesome job! I'm very interested in the cross-language support. Cheers, On Tue, 24 Mar 2020 at 01:20, Chamikara Jayalath wrote: > Sounds great. Looks like operation of the Snowflake source will be similar > to BigQuery source (export files to GCS and read files). This will allow > you to better

Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-23 Thread Chamikara Jayalath
Sounds great. Looks like operation of the Snowflake source will be similar to BigQuery source (export files to GCS and read files). This will allow you to better parallelize reading (current JDBC source is limited to one worker when reading). Seems like you already support initial splitting using

Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-23 Thread Alexey Romanenko
Great! This is always welcomed to have more IOs in Beam. I’d be happy to take look on your PR once it will be created. Just a couple of questions for now. 1) Afaik, you can connect to Snowflake using standard JDBC driver. Do you plan to compare a performance between this SnowflakeIO and Beam Jd

Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-23 Thread Jean-Baptiste Onofre
Hi, It’s very interesting. +1 to create a Jira and prepare a PR for review. Thanks ! Regards JB > Le 23 mars 2020 à 15:23, Katarzyna Kucharczyk a > écrit : > > Hi all, > > Me and my colleagues have developed a new Java connector for Snowflake that > we would like to add to Beam. > > Snowfl

[PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-23 Thread Katarzyna Kucharczyk
Hi all, Me and my colleagues have developed a new Java connector for Snowflake that we would like to add to Beam. Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). It uses a new SQL database engine with a unique architecture designed for the cloud. To read more det