Thanks to both of you, this is helpful.
On Wed, May 26, 2021 at 6:07 PM, Weston Pace
wrote:
> Elad's advice is very helpful. This is not a problem that Arrow solves
> today (to the best of my knowledge). It is a topic that comes up
> periodically[1][2][3]. If a crash happens while your parqu
I want to add a few notes from my experience with Kafka:
1. There's an ecosystem - having battle-tested consumers that write to
various external systems, with known reliability guarantees, is very
helpful. It's also then possible to have multiple consumers - some batch,
some real-time streaming (e
Elad's advice is very helpful. This is not a problem that Arrow solves
today (to the best of my knowledge). It is a topic that comes up
periodically[1][2][3]. If a crash happens while your parquet stream writer
is open then the most likely outcome is that you will be missing the footer
(this get
Hi,
While I'm not using the C++ version of Arrow, the issue you're talking
about is a very common concern.
There are a few points to discuss here:
1. Generally, Parquet files cannot be appended to. You could of course load
the file to memory, add more information and re-save, but that's not real
I have a very long-running (months) program that is streaming in data
continually, processing it, and saving it to file using Arrow. My current
solution is to buffer several million rows and write them to a new .parquet
file each time. This works, but produces 1000+ files every day.
If I could, I