Thanks Adrien. I spent some time trying to understand the readByte() in
ReverseRandomAccessReader (through FST) and compare with 7.x. Although I
don't understand ALL of the details and reasoning for always loading the
FST (and in turn the term index) off-heap (as discussed in
https://github.com/ap
Yes, this changed in 8.x:
- 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. https://github.com/apache/lucene/issues/9681
- Then in 8.6 the FST was moved off-heap all the time.
https://github.com/apache/lucene/issues/10297
More generally, there's a few files that are no l
Thanks Adrien. Is this behavior of FST something that has changed in Lucene
8.x (from 7.x)?
Also, is the terms index not loaded into memory anymore in 8.x?
To your point on MMapDirectoryFactory, it is much faster as you
anticipated, but the indexes commonly being >1 TB makes the Windows machine
fr
+Alan Woodward helped me better understand what is going on here.
BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory)
doesn't play well with the fact that the FST reads bytes backwards:
every call to readByte() triggers a refill of 1kB because it wants to
read the byte that is just be
My best guess based on your description of the issue is that
SimpleFSDirectory doesn't like the fact that the terms index now reads
data directly from the directory instead of loading the terms index in
heap. Would you be able to run the same benchmark with MMapDirectory
to check if it addresses th