>
> My understanding is that Arrow's requirement is that access is always
> O(1), so I was wondering if a run length encoding might be possible to be
> used in a situation like this?


There have been some discussions on introducing RLE into the specification
and relaxing O(1) to O(LG(N)).  One is going on in the Dev mailing list now.

Since all of them have the same timestamp, should I be using metadata to
> say that they all share the same timestamp, or should the timestamp be part
> of the record?


This is really an application specific question.  As Nicholas pointed out
this is another way to lower the overhead.  At least one user has
used-metadata to keep track of min/max statistics for data in a RecordBatch
and using it for a single timestamp would be similar.  You just need to be
careful at application boundaries to make sure other systems can understand
the metadata or denormalize it appropriately.



On Tue, Dec 14, 2021 at 10:14 AM Nicholas Poorman <[email protected]>
wrote:

> If you use dictionary encoding for the timestamp field it will store the
> column more efficiently. If there were many different time stamps you
> wouldn’t want to do that but if they are all the same “string” value it’s
> fine.
>
> On Tue, Dec 14, 2021 at 1:02 PM Frederic Branczyk <[email protected]>
> wrote:
>
>> Hello,
>>
>> I've posted a couple of things asking about Arrow over the last few weeks
>> and I've come across another thing that I'm hoping you can help me
>> understand better.
>>
>> I have a workload that writes a lot of data at once with a single
>> timestamp, eg. 10k values, that each has an ID attached.
>>
>> Since all of them have the same timestamp, should I be using metadata to
>> say that they all share the same timestamp, or should the timestamp be part
>> of the record? What I'm concerned about is the amount of space complexity
>> this may incur unnecessarily. My understanding is that Arrow's requirement
>> is that access is always O(1), so I was wondering if a run length encoding
>> might be possible to be used in a situation like this?
>>
>> Intuitively it feels wrong to use metadata to store a timestamp, but that
>> made me wonder, what are typical uses of Arrow metadata, or could you share
>> some example in the wild?
>>
>> Thank you for your help!
>>
>> Best,
>> Frederic
>>
>

Reply via email to