HI,

Not sure since when the filtering system has been worked on in this depth, but 
I suspect it has been a while. Finding someone completely up to speed about 
this may be a challenge.

Thanks,
Jaap


> On 15 Jun 2020, at 05:38, Sidhant Bansal <sidhban...@gmail.com> wrote:
> 
> Hi all,
> 
> I want to propose an improvement to speed up the display filters by avoiding 
> to re-dissect all the packets again and again when not required and instead 
> maintaining a cache of the fields that have been queried recently.
> 
> Motivation: Benchmarking filtering on capture files > 100 MB shows that the 
> re-dissection step, i.e the amount of time spent inside the dissector tends 
> to be a lot, i.e > ~40-50% of the total time spent is consumed to re-dissect. 
> I believe we can make huge savings here.
> 
> Example:
> 1st Filter applied: tcp.srcport >= 1200 && tcp.dstport <= 1500
> This filter runs normally as it does right now AND stores the tcp.srcport and 
> tcp.dstport for all the packets on-memory in wireshark
> 2nd Filter applied: tcp.srcport == 80
> We don't need to re-dissect all the packets again and can simply refer to the 
> information stored to apply the filter.
> 3rd Filter applied: tcp.srcport == 120 || udp.srcport == 80
> Since we haven't stored "udp.srcport" in our cache, therefore we need to 
> re-dissect again AND we will store udp.srcport for all the packets also (to 
> speed-up future filter queries)
> 4th Filter applied: tcp.srcport == 40 || udp.srcport >= 1000 || tcp.dstport 
> <= 500
> Since all of these fields are in cache, so we can refer to them directly from 
> the on-memory information stored and don't need to re-dissect any of the 
> packets.
> 
> We can limit the number of fields we store on-memory at any given moment of 
> time depending on how many packets we have and how much memory we can afford 
> to allocate. And deleting the fields from the cache can be done according to 
> a specific cache replacement policy (I haven't thought about which one will 
> the most apt, input is welcome)
> 
> Most of the fields tend to be fixed-length in terms of bytes and are small, 
> i.e <= 8bytes. For fields such as strings that are variable-length and can be 
> arbitrarily large we can avoid doing this caching procedure and instead 
> re-dissect all the packets if the filter expression consists of such a field.
> 
> From an implementation point of view: The cached fields information can be 
> stored inside the frame_data since that remains persistent throughout 
> wireshark's execution for a single capture file opened. Now whenever we 
> encounter a new filter query we can check if all the fields are in the cache 
> or not? If yes, then once we convert our abstract syntax tree of the filter 
> query to DFVM and then query, we should lookup the cache instead of 
> re-dissecting. If no, then we do what we do currently, i.e re-dissect but we 
> also store this new field into our cache (according to the specific 
> replacement policy)
> 
> Want to know about any feedback or objections to this optimization.
> 
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
> Archives:    https://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
>             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Reply via email to