If you use dictionary encoding for the timestamp field it will store the
column more efficiently. If there were many different time stamps you
wouldn’t want to do that but if they are all the same “string” value it’s
fine.

On Tue, Dec 14, 2021 at 1:02 PM Frederic Branczyk <[email protected]>
wrote:

> Hello,
>
> I've posted a couple of things asking about Arrow over the last few weeks
> and I've come across another thing that I'm hoping you can help me
> understand better.
>
> I have a workload that writes a lot of data at once with a single
> timestamp, eg. 10k values, that each has an ID attached.
>
> Since all of them have the same timestamp, should I be using metadata to
> say that they all share the same timestamp, or should the timestamp be part
> of the record? What I'm concerned about is the amount of space complexity
> this may incur unnecessarily. My understanding is that Arrow's requirement
> is that access is always O(1), so I was wondering if a run length encoding
> might be possible to be used in a situation like this?
>
> Intuitively it feels wrong to use metadata to store a timestamp, but that
> made me wonder, what are typical uses of Arrow metadata, or could you share
> some example in the wild?
>
> Thank you for your help!
>
> Best,
> Frederic
>

Reply via email to