[PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
bolkedebruin opened a new pull request, #34729: URL: https://github.com/apache/airflow/pull/34729 This adds Airflow FS, which is a generic abstraction on top of a filesystem or object storage. The idea is that you can do the following ```python import airflow.io.fs as afs af

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
eladkal commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1343924059 ## airflow/io/__init__.py: ## Review Comment: this seems to be mostly used by providers (for transfer operators). so maybe it's better to add it as `common.fs

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
bolkedebruin commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1343934484 ## airflow/io/__init__.py: ## Review Comment: That would makes sense, but the intention is to make xcom and Dag parsing work on top of it -- This is an

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
bolkedebruin commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1343934484 ## airflow/io/__init__.py: ## Review Comment: That would makes sense, but the intention is to make xcom and Dag parsing work on top of it (at least have th

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
dstandish commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1344389497 ## airflow/io/__init__.py: ## Review Comment: but at least the provider-specific code should be moved to providers. like the implementations of the interface

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
eladkal commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1344421695 ## airflow/io/__init__.py: ## Review Comment: Ideally all code in hooks/sensors/operatos should be in providers. That at least should be what we aim for --

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
eladkal commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1344421695 ## airflow/io/__init__.py: ## Review Comment: Ideally all code in hooks/sensors/operatos should be in providers. That at least should be what we should aim for

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1745354692 > Design wise: I'm not sure about the `fs.mount` affecting global state -- could we instead do something like this: > > ```python > s3fs = afs.mount("s3://warehouse", "/mn

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
bolkedebruin commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r137639 ## airflow/io/__init__.py: ## Review Comment: I am happy to move the provider specific code to the providers. For now it was just convenient to deal with i

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-03 Thread via GitHub
eladkal commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1344421695 ## airflow/io/__init__.py: ## Review Comment: Ideally all code in hooks/sensors/operatos should be in providers. That at least should be what we aim for (my opi

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-04 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1746331259 > Looking forward to having this great feature! > > We had a good experience with pluggable components (task log handler, executor, secret manager, ...), I wonder if you can

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
uranusjr commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748604799 Is the `/mnt` path necessary? I wonder if we should instead invisibly mount the fs to somewhere random by default and combine the concept with, say, Dataset, to help abstract away the

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748640426 > Is the `/mnt` path necessary? I wonder if we should instead invisibly mount the fs to somewhere random by default and combine the concept with, say, Dataset, to help abstract awa

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
uranusjr commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748680743 What I’m envisioning is something like ```python warehouse_mnt = afs.mount("s3://warehouse") # Can have conn_id too, it’s orthogontal. output_mnt = afs.mount("file:///tmp"

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748698174 > What I’m envisioning is something like > > ```python > warehouse_mnt = afs.mount("s3://warehouse") # Can have conn_id too, it’s orthogontal. > > @dag > def my

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748763653 @uranusjr PTL - both patterns are now supported -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
uranusjr commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1347355406 ## airflow/io/fs/__init__.py: ## @@ -0,0 +1,966 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1347370339 ## airflow/io/fs/__init__.py: ## @@ -0,0 +1,966 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
uranusjr commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748840122 > Not entirely sure about this. To me, for now, both are quite different. A dataset points to data and a mount provides an interface that allows you to manipulate file like objects. So

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
uranusjr commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748843027 Also, when do I call unmount? I kind of feel there’s really no good use case to have that function. Or maybe we should have some auto context management instead. -- This is an autom

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
uranusjr commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1347416018 ## airflow/io/fs/__init__.py: ## @@ -0,0 +1,969 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
uranusjr commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1347412962 ## airflow/io/fs/__init__.py: ## @@ -0,0 +1,969 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1347480793 ## airflow/io/fs/__init__.py: ## @@ -0,0 +1,969 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1347480793 ## airflow/io/fs/__init__.py: ## @@ -0,0 +1,969 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1749091768 @uranusjr > I see what you mean. The two do have some similarities though, say I want to trigger a(nother) DAG when a file on S3 is modified, I would write something like t

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on code in PR #34729: URL: https://github.com/apache/airflow/pull/34729#discussion_r1347581388 ## airflow/io/fs/__init__.py: ## @@ -0,0 +1,969 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-05 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1749104606 > Also, when do I call unmount? I kind of feel there’s really no good use case to have that function. Or maybe we should have some auto context management instead. There is

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-06 Thread via GitHub
uranusjr commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1750230494 I thought about the overall design some more. Aggregated comments on multiple things. re: `__truediv__` returning str. In `pathlib`, `Path.__truediv__` returns another Path inst

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-06 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1750772291 @uranusjr sorry I see now that I accidentally edited your comment :-P -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] WIP: Add Airflow FS [airflow]

2023-10-06 Thread via GitHub
bolkedebruin commented on PR #34729: URL: https://github.com/apache/airflow/pull/34729#issuecomment-1751148942 @uranusjr context manager included now so you can do: ```python with afs.mount("s3://warehouse") as mnt: mnt.copy(x, y) ``` and (Pure)Path interface: