I agree it's worth discussing. I opened https://github.com/apache/lucene/issues/12355 and https://github.com/apache/lucene/issues/12356.
On Tue, Jun 6, 2023 at 9:17 PM Rahul Goswami <rahul196...@gmail.com> wrote: > > Thanks Adrien. I spent some time trying to understand the readByte() in > ReverseRandomAccessReader (through FST) and compare with 7.x. Although I > don't understand ALL of the details and reasoning for always loading the > FST (and in turn the term index) off-heap (as discussed in > https://github.com/apache/lucene/issues/10297 ) I understand that this is > essentially causing disk access for every single byte during readByte(). > > Does this warrant a JIRA for regression? > > As mentioned, I am noticing a 10x slowdown in SegmentTermsEnum.seekExact() > affecting atomic update performance . For setups like mine that can't use > mmap due to large indexes this would be a legit regression, no? > > - Rahul > > On Tue, Jun 6, 2023 at 10:09 AM Adrien Grand <jpou...@gmail.com> wrote: > > > Yes, this changed in 8.x: > > - 8.0 moved the terms index off-heap for non-PK fields with > > MMapDirectory. https://github.com/apache/lucene/issues/9681 > > - Then in 8.6 the FST was moved off-heap all the time. > > https://github.com/apache/lucene/issues/10297 > > > > More generally, there's a few files that are no longer loaded in heap > > in 8.x. It should be possible to load them back in heap by doing > > something like that (beware, I did not actually test this code): > > > > class MyHeapDirectory extends FilterDirectory { > > > > MyHeapDirectory(Directory in) { > > super(in); > > } > > > > @Override > > public IndexInput openInput(String name, IOContext context) throws > > IOException { > > if (context.load == false) { > > return super.openInput(name, context); > > } else { > > try (IndexInput in = super.openInput(name, context)) { > > byte[] bytes = new byte[Math.toIntExact(in.length())]; > > in.readBytes(bytes, bytes.length); > > ByteBuffer bb = > > ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asReadOnlyBuffer(); > > return new ByteBuffersIndexInput(new > > ByteBuffersDataInput(Collections.singletonList(bb)), > > "ByteBuffersIndexInput(" + name + ")"); > > } > > } > > } > > > > } > > > > On Tue, Jun 6, 2023 at 3:41 PM Rahul Goswami <rahul196...@gmail.com> > > wrote: > > > > > > Thanks Adrien. Is this behavior of FST something that has changed in > > Lucene > > > 8.x (from 7.x)? > > > Also, is the terms index not loaded into memory anymore in 8.x? > > > > > > To your point on MMapDirectoryFactory, it is much faster as you > > > anticipated, but the indexes commonly being >1 TB makes the Windows > > machine > > > freeze to a point I sometimes can't even connect to the VM. > > > SimpleFSDirectory works well for us from that standpoint. > > > > > > To add, both NIOFS and SimpleFS have similar indexing benchmarks on > > > Windows. I understand it is because of the Java bug which synchronizes > > > internally in the native call for NIOFs. > > > > > > -Rahul > > > > > > On Tue, Jun 6, 2023 at 9:32 AM Adrien Grand <jpou...@gmail.com> wrote: > > > > > > > +Alan Woodward helped me better understand what is going on here. > > > > BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) > > > > doesn't play well with the fact that the FST reads bytes backwards: > > > > every call to readByte() triggers a refill of 1kB because it wants to > > > > read the byte that is just before what the buffer contains. > > > > > > > > On Tue, Jun 6, 2023 at 2:07 PM Adrien Grand <jpou...@gmail.com> wrote: > > > > > > > > > > My best guess based on your description of the issue is that > > > > > SimpleFSDirectory doesn't like the fact that the terms index now > > reads > > > > > data directly from the directory instead of loading the terms index > > in > > > > > heap. Would you be able to run the same benchmark with MMapDirectory > > > > > to check if it addresses the regression? > > > > > > > > > > > > > > > On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami <rahul196...@gmail.com> > > > > wrote: > > > > > > > > > > > > Hello, > > > > > > We started experiencing slowness with atomic updates in Solr after > > > > > > upgrading from 7.7.2 to 8.11.1. Running several tests revealed the > > > > > > slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() > > call > > > > > > which eventually calls Lucene's SegmentTermsEnum.seekExact().. > > > > > > > > > > > > In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. > > After > > > > > > discussion on the Solr mailing list I created the below JIRA: > > > > > > > > > > > > https://issues.apache.org/jira/browse/SOLR-16838 > > > > > > > > > > > > The thread dumps collected show a lot of threads stuck in the > > > > > > FST.findTargetArc() > > > > > > method. Testing environment details: > > > > > > > > > > > > Environment details: > > > > > > - Java 11 on Windows server > > > > > > - Xms1536m Xmx3072m > > > > > > - Indexing client code running 15 parallel threads indexing in > > batches > > > > of > > > > > > 1000 on a standalone core. > > > > > > - using SimpleFSDirectoryFactory (since Mmap doesn't quite work > > well > > > > on > > > > > > Windows for our index sizes which commonly run north of 1 TB) > > > > > > > > > > > > > > > > > > https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing > > > > > > > > > > > > Is there a known issue with slowness with TermsEnum.seekExact() in > > > > Lucene > > > > > > 8.x ? > > > > > > > > > > > > Thanks, > > > > > > Rahul > > > > > > > > > > > > > > > > > > > > -- > > > > > Adrien > > > > > > > > > > > > > > > > -- > > > > Adrien > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > -- > > Adrien > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org