I realized after converting the string array to dictionary encoded. I
didn't see any difference. There are too many filters to compute roughly
50-60K. I'm using Acero's Filter node. I'll try your suggestion of using
list of indices.

Thanks,
Surya

On Fri, Apr 14, 2023 at 10:17 AM Weston Pace <[email protected]> wrote:

> It's unlikely there will be much benefit in using dictionary encoding.
> How are you doing the filtering?  It sounds like you might have many
> filters.  `arrow::compute::Filter` will not get great performance if you
> want to apply many filters.  No one has added support for selection vectors
> but that would probably be the fastest way to apply many filters against
> the same array.  Ideally you could avoid any kind of allocation between
> each filter pass.  Although, if the filters were highly selective you might
> want to use a list of indices instead of a selection vector.  However, this
> has also not been implemented.
>
> On Wed, Apr 12, 2023 at 7:57 PM Surya Kiran Gullapalli <
> [email protected]> wrote:
>
>> Hi,
>> I’m trying to run filter based on multiple columns (decided at run time)
>> in a table. There are more than a million  of filters I have to run and
>> even though I’m getting results it was taking humongous amount of time to
>> complete the operation. I’m using Acero filter node to get the filtered
>> table
>>
>> I’d like to know if there’ll be a performance improvement if I use
>> dictionary encoded arrays instead of string arrays?
>>
>> Also, any other pointers to speed up the operation
>>
>> Thanks,
>> Surya
>>
>>
>>

Reply via email to