Re: Improving ZipFile.getEntryPos double loop queries

2020-04-16 Thread Claes Redestad
Hi Eirik, On 2020-04-16 13:47, Eirik Bjørsnøs wrote: On Thu, Apr 16, 2020 at 12:59 PM Claes Redestad mailto:claes.redes...@oracle.com>> wrote: I think this is an interesting idea and a good optimization. Looks like we get most of the performance win of Bloom filters while not introduci

Re: Improving ZipFile.getEntryPos double loop queries

2020-04-16 Thread Eirik Bjørsnøs
On Thu, Apr 16, 2020 at 12:59 PM Claes Redestad wrote: > > I think this is an interesting idea and a good optimization. Looks like we get most of the performance win of Bloom filters while not introducing regressions for hits, footprint or complexity. We'll > deliberately cause a few more hash

Re: Improving ZipFile.getEntryPos double loop queries

2020-04-16 Thread Claes Redestad
Hi, I think this is an interesting idea and a good optimization. We'll deliberately cause a few more hash collisions, but since we do half as many hash table lookups in the normal case (and the streaming/iterators shouldn't care) then that should be fine. Some issues: - Need to be made to work

Improving ZipFile.getEntryPos double loop queries

2020-04-16 Thread Eirik Bjørsnøs
Hi, ZipEntry.getEntryPos currently has a double loop which retries search after a failed lookup by appending a '/' to the name/hash. This means that any miss needs to query the hash table twice. The following patch updates hashN to truncate any '/' at the end of an entry name. This ensures that t