Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-15 Thread Tim Swast
I like this proposal to create an "airflow-backward-compatibility-operators" package, which can be released independently of "airflow", and then migrate to "micro-packages" after that. I've updated AIP-8 to describe detail

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-10 Thread Maxime Beauchemin
A related though is around the fact that all these libs depend on Airflow itself to get the base class they are deriving (BaseHook and BaseOperator mostly). It's a bit upside down when the small library depends on a big library. That may be ok as is, but pushing the micro-package logic would dictat

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-10 Thread Maxime Beauchemin
That's not what I meant. If I apply what I meant to your example we'd have a single package for each hook `airflow-hook-s3` and `airflow-hook-gcs`, and a package for `airflow-operator-s3-to-gcs`. The operator package would depend on both hook packages. There's no code or test duplication there. If

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-09 Thread airflowuser
@Max I don't see how this is doable. Consider S3ToGoogleCloudStorageOperator It users both S3Hook and GoogleCloudStorageHook. With your suggestion we have to maintain S3Hook in each separated package per operator/sensor. Which means for example if new parameter is added to any of the hooks you

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-09 Thread Maxime Beauchemin
If there's a strict policy of having a single hook and a single operator per package, then the hook package would be the only place where the external dependency is defined, and the operator packages would depend on hook package(s). That would follow the "micro package" philosophy and could work pr

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-09 Thread Felix Uellendall
Regardless of how complex this implementation would be I am +1 on this. From the developer's point of view that the CI would run so much faster is the biggest plus for me. I think It will only become worse the more dependencies we add. From the user's point of view that I am able to choose from

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-08 Thread Jarek Potiuk
While splitting the monolithical Airfllow architecture to pieces sounds good, there is one problem that might be difficult to tackle (or rather impossible unless we change architecture of Airflow significantly) - namely dependencies/requirements. The way Airflow uses operators is that its operator

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-08 Thread Tim Swast
> I don't see it solving any problem than test speed (which is a big one, yes) but doesn't reduce the amount of workload on the committers. It's about distributed ownership. For example, I'm not a committer in pandas, but I am the primary maintainer of pandas-gbq. You're right that if the set of c

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-08 Thread Ash Berlin-Taylor
Can someone explain to me how having multiple packages will work in practice? How will we ensure that core changes don't break any hooks/operators? How do we support the logging backends for s3/azure/gcp? What would the release process be for the "sub"-packages? There is nothing stopping some

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-08 Thread airflowuser
I think the operator should be placed by the source. If it's MySQLToHiveOperator then it would be placed in MySQL package. The BIG question here is if this serve actual improvement like faster deployment of hook/operators bug-fix to Airflow users (faster than actual Airflow release) or this is

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-08 Thread Tim Swast
> I’m not sure package structure based on whether major providers will fund development is the right approach. Regarding data transfer operators that cover 2 different systems, we have a few choices: - Place all data transfer operators in special data transfer repository. The same problems we

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-07 Thread Brian Greene
I’m not sure package structure based on whether major providers will fund development is the right approach. My $.02 Sent from a device with less than stellar autocorrect > On Jan 7, 2019, at 3:44 PM, Tim Swast wrote: > > In general it’s easier for cloud providers to fund development of opera

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-07 Thread Tim Swast
In general it’s easier for cloud providers to fund development of operators that bring data in. I’d say if there is overlap, put the operator in the target system’s repo. On Mon, Jan 7, 2019 at 2:17 PM Maxime Beauchemin wrote: > Something to think about is how data transfer operators like the >

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-07 Thread Kamil Breguła
I would like to mention that there is a 2 pull requests that introduces plugin support installed by pip https://github.com/apache/airflow/pull/4412 https://github.com/apache/airflow/pull/730 These PRs should be analyzed to meet all our requirements It is worth mentioning that this change not only

Re: AIP-8 Split Hooks/Operators into Separate Packages

2019-01-07 Thread Maxime Beauchemin
Something to think about is how data transfer operators like the MysqlToHiveOperator usually rely on 2 hooks. With a package-specific approach that may mean something like an `airflow-hive`, `airflow-mysql` and `airflow-mysql-hive` packages, where the `airflow-mysql-hive` package depends on the two

AIP-8 Split Hooks/Operators into Separate Packages

2019-01-07 Thread Tim Swast
I've created AIP-8: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 To follow-up from the discussion about splitting hooks/operators out of the core Airflow package at http://mail-archives.apache.org/mod_mbox/airflow-dev/201809.mbox/%3c308670db-bd2a-4738-81b1-3f6fb312c..