Re: Parametrisable output metadata path

2023-04-18 Thread Wojciech Indyk
Thank you for your response! I misread "data lake" as "delta lake", my bad. Anyway I need to write output to file system. I see your point about data lakes, however migrations take time, so at least from this perspective I wouldn't deprecate FileStreamSink. I hope FileStreamSink will be still maint

Re: Parametrisable output metadata path

2023-04-17 Thread Jungtaek Lim
small correction: "I intentionally didn't enumerate." The meaning could be quite different so making a small correction. On Tue, Apr 18, 2023 at 5:38 AM Jungtaek Lim wrote: > There seems to be miscommunication - I didn't mean "Delta Lake". I meant > "any" Data Lake products. Since I'm biased I d

Re: Parametrisable output metadata path

2023-04-17 Thread Jungtaek Lim
There seems to be miscommunication - I didn't mean "Delta Lake". I meant "any" Data Lake products. Since I'm biased I didn't intentionally enumerate actual products, but there are "Apache Hudi", "Apache Iceberg", etc as well. We made non-trivial numbers of band-aid fixes already for file stream si

Re: Parametrisable output metadata path

2023-04-17 Thread Wojciech Indyk
Hi Jungtaek, integration with Delta Lake is not an option to me, I raised a PR for improvement of FileStreamSink with the new parameter: https://github.com/apache/spark/pull/40821. Can you please take a look? -- Kind regards/ Pozdrawiam, Wojciech Indyk niedz., 16 kwi 2023 o 04:45 Jungtaek Lim n

Re: Parametrisable output metadata path

2023-04-15 Thread Jungtaek Lim
Hi, We have been indicated with lots of issues with the current FileStream sink. The effort to fix these issues are quite significant, and it ended up with derivation of "Data Lake" products. I'd recommend not to fix the issue but leave it as its limitation, and integrate your workload with Data