Aggregating values by other columns

Frederic Branczyk Fri, 03 Dec 2021 04:47:02 -0800

Hello,

First of all thank you so much for your work on Arrow, it looks like a very
promising piece of technology.


I'm very new to Arrow, and I'm trying to understand whether arrow is a good
fit for our use case (and if so, if you could maybe give us some pointers
as to which data structures might make sense). We happen to use Go, but I
would think that for the extent of my questions it should be language
agnostic.

We have a workload that works with data whose table looks pretty much like

+----------+----------+-----------+-------+
| SeriesID | EntityID | Timestamp | Value |
+----------+----------+-----------+-------+

Data is written by participants of the system by SeriesID, with a random,
unpredictable EntityID, and many values at the same time.

Queries to this data are typically filtering by a set of SeriesIDs and a
set of EntityIDs, as well as a certain time-frame and the remaining
datasets are added up and aggregated by EntityIDs, so that the result is
basically a map of EntityID to Value.

Maybe this influences the answer, since we are dealing with a lot of data,
our hope was that we could store the data in object storage and essentially
memory map it with multiple layers of caches from object storage to main
memory.

At first glance, Arrow looks like a great fit, but I'd love to hear your
thoughts as well as if a particular strategy or data structures come to
mind for a workload like this.

Best regards,
Frederic

Aggregating values by other columns

Reply via email to