Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
kevinjqliu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324865049 Its configurable via the write properties. See this comment https://github.com/apache/iceberg-python/issues/1041#issuecomment-2323380629 -- This is an automated mess

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
TiansuYu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324857080 Reading on table spec, I just realised that there is a field `location` in https://iceberg.apache.org/spec/#table-metadata-fields that specifies a base location of the table

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
kevinjqliu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324831712 yep! There's definitely opportunities to consolidate the two. I opened #310 with some details. -- This is an automated message from the Apache Git Service. To respo

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
TiansuYu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324824803 Also reading on here: https://arrow.apache.org/docs/python/filesystems.html#using-arrow-filesystems-with-fsspec There might be some opportunity that we simplify the

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
TiansuYu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324794360 @kevinjqliu I think resolving fs at file level should make the API cleaner. I would say one benefit one might choose fs on table level is to reuse that fs instance for perfo

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
kevinjqliu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324762218 Generally, this problem should go away if we re-evaluate `fs` and `io` each time a file is read and written. Or other words, we should stop passing the `io` parameter arou

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
kevinjqliu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324758344 > I dont think fixing SqlCatalog alone is the proper answer to this bug. The io layer seems to me ill written and has to be fixed somewhere in the uppper level (e.g. Fsspe

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
kevinjqliu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324750207 Thanks for taking a look at this @TiansuYu > why we are implementing a custom I think custom scheme parsing avoids picking one library over another (`fsspec`

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
TiansuYu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324566016 Read my comment [here](https://gist.github.com/kevinjqliu/647808faba256855639e91dd58243082?permalink_comment_id=5175413#gistcomment-5175413) for the cause of the issue.

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-02 Thread via GitHub
TiansuYu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2324181879 I will have a look this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-09-01 Thread via GitHub
kevinjqliu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2323380629 Oh interesting, thanks! Here's the config definition for `write.data.path` and `write.metadata.path` https://iceberg.apache.org/docs/latest/configuration/#write-prope

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-08-20 Thread via GitHub
Fokko commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2298248947 This is a good point, I've heard that folks store their metadata on HDFS, and the data itself on S3. I don't think the example with the add-files is the best, it would be