Thank you for your response!
I misread "data lake" as "delta lake", my bad. Anyway I need to write
output to file system. I see your point about data lakes, however
migrations take time, so at least from this perspective I wouldn't
deprecate FileStreamSink. I hope FileStreamSink will be still maint
small correction: "I intentionally didn't enumerate." The meaning could be
quite different so making a small correction.
On Tue, Apr 18, 2023 at 5:38 AM Jungtaek Lim
wrote:
> There seems to be miscommunication - I didn't mean "Delta Lake". I meant
> "any" Data Lake products. Since I'm biased I d
There seems to be miscommunication - I didn't mean "Delta Lake". I meant
"any" Data Lake products. Since I'm biased I didn't intentionally enumerate
actual products, but there are "Apache Hudi", "Apache Iceberg", etc as well.
We made non-trivial numbers of band-aid fixes already for file stream si
Hi Jungtaek,
integration with Delta Lake is not an option to me, I raised a PR for
improvement of FileStreamSink with the new parameter:
https://github.com/apache/spark/pull/40821. Can you please take a look?
--
Kind regards/ Pozdrawiam,
Wojciech Indyk
niedz., 16 kwi 2023 o 04:45 Jungtaek Lim
n
Hi,
We have been indicated with lots of issues with the current FileStream
sink. The effort to fix these issues are quite significant, and it ended up
with derivation of "Data Lake" products.
I'd recommend not to fix the issue but leave it as its limitation, and
integrate your workload with Data