Hi,

Next step I believe is for you to open a Solr JIRA issue for this regression, 
as it is obviously an issue.
Then we can use that JIRA to coordinate, even if the fix ends up being in 
Lucene code.

Once we have a Solr JIRA, it may make sense to open a mail thread on 
users@lucene list to get some Lucene expertice involved in digging.

Jan

> 1. jun. 2023 kl. 18:15 skrev Rahul Goswami <[email protected]>:
> 
> So I ran the test to index 5 million docs this time (batches of 1000 docs
> in 15 parallel threads). To eliminate the network overhead and get as
> accurate a benchmark as possible, I used an AtomicLong to measure the time
> around the RTG call in DistibutedUpdateProcessor across all calls (
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416).
> Did this for both Solr 7.7.2 and Solr 8.11.1 and built the solr-core.jar to
> replace it in the solr webapp lib.
> 
> RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in
> milliseconds):
> 
> *Solr 7.7.2 *: 2023-06-01 15:39:48.272 WARN  (qtp1034094674-24) [
> x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg time:7293486*
> *Solr 8.11.1: *2023-06-01 04:46:24.758 WARN  (qtp391506011-71) [
> x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> time:72029877*
> 
> Let me know if I can help further with debugging. At this point I am
> suspecting the slowness to be on Lucene's side with the findTargetArc()
> method in FST.java. I am running more analysis on my side in parallel and
> will share if/when I find more.
> 
> -Rahul
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Jun 1, 2023 at 3:53 AM Jan Høydahl <[email protected]> wrote:
> 
>> Yes, the numbers should be similar. Thanks for digging in. If there is a
>> big regression, it could be worthwile running the same test on v8.0, 8.1,
>> 8.2 etc to plot in what version the regression happened. Perhaps it could
>> even be scripted :)
>> 
>> Solr unfortunately do not yet have a comprehensive official nightly
>> benchmark run that would catch such regressions. But the community is
>> working on it, there are some tests on http://mostly.cool maintained by
>> Ishan, but I have no idea how to add new tests..
>> 
>> Jan
>> 
>>> 31. mai 2023 kl. 23:50 skrev Rahul Goswami <[email protected]>:
>>> 
>>> Sure, I can do that. Let me create an index with a few million docs, call
>>> RTG with a few million iterations on it and note the times between 7.x
>> and
>>> 8.x. I assume this should be sufficient (?)
>>> 
>>> On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <[email protected]>
>> wrote:
>>> 
>>>> Would be nice to determine whether RTG is orders of magnitude slower in
>>>> 8.x than 7.x and is the main culprit.  Then we could isolate the
>> testing to
>>>> RTG only and not involce Atomic Update?
>>>> 
>>>> Jan
>>>> 
>>>>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <[email protected]>:
>>>>> 
>>>>> I don’t have any nested documents. And the results are consistent
>> across
>>>>> multiple runs. I tried looking for similar issues in the mailing list,
>>>> but
>>>>> couldn’t find anything relevant . So if you do happen to find any JIRAs
>>>>> addressing it that would be really helpful (thanks!).
>>>>> 
>>>>> To Jan’s question about RTG taking more time in Solr 8.x, I can say
>> with
>>>>> good certainty that this is the case. Although it does look into
>>>>> transaction logs first, thread dumps suggest that it is the next phase
>>>>> (when it doesn't find the doc in tlog) which seems to be time
>> consuming .
>>>>> It tries to look up the document via the current searcher
>>>>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is
>> this
>>>>> call where many threads are spending time:
>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
>>>>> 
>>>>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
>>>>> something changed in Lucene's FST.java which is causing the slowness. I
>>>> am
>>>>> trying to dig further and might also ask folks on the Lucene mailing
>>>> list.
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, May 31, 2023 at 11:36 AM Srijan <[email protected]> wrote:
>>>>> 
>>>>>> I would love some profiling as well. I know 8.8 or 8.9 had some
>>>> performance
>>>>>> problems with atomic update but this was later addressed. I cant find
>>>> the
>>>>>> jira atm though. Also I am on 8.11.1 and atomic update is not an issue
>>>> for
>>>>>> me.
>>>>>> 
>>>>>> By the way, do you happen to have nested docs?
>>>>>> 
>>>>>> 
>>>>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <[email protected]>
>> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> 
>>>>>>> MMap is most important for searching. Indexing bypasses the cache by
>>>>>> using
>>>>>>> direct IO.
>>>>>>> 
>>>>>>> I have noticed slow real time get on Solr 8.x during atomic update
>>>>>> myself.
>>>>>>> Would be interesting with a comparison with profiling. RTG gets the
>>>>>>> document from transaction log I believe? Could there be some RTG
>>>> changes
>>>>>> in
>>>>>>> 8.x that caused such slowdown?
>>>>>>> 
>>>>>>> Jan Høydahl
>>>>>>> 
>>>>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <[email protected]>:
>>>>>>>> 
>>>>>>>> Thanks for the response Shawn. We are using Windows server with
>>>> pretty
>>>>>>> huge
>>>>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
>>>>>> machine
>>>>>>>> just completely freezes with high CPU and memory usage to a point
>>>> where
>>>>>>> it
>>>>>>>> becomes impossible to even connect to it. SimpleFS works out well
>> for
>>>>>> us
>>>>>>> in
>>>>>>>> this case.
>>>>>>>> 
>>>>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes the
>>>>>>> crawl
>>>>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
>>>>>> something
>>>>>>>> OUTSIDE the directory factory in the code which is causing this.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Rahul
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <[email protected]
>>> 
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
>>>>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m
>> Xmx3072m
>>>>>> -
>>>>>>>>>> Indexing client code running 15 parallel threads indexing in
>> batches
>>>>>> of
>>>>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite
>> work
>>>>>>>>>> well on Windows for our index sizes which commonly run north of 1
>>>> TB)
>>>>>>>>> 
>>>>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP for
>>>>>> your
>>>>>>>>> indexes.  Not using MMAP is likely to slow things down
>> considerably.
>>>>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.
>> Which
>>>>>> of
>>>>>>>>> course requires 64-bit hardware.
>>>>>>>>> 
>>>>>>>>> 32 bit systems and software cannot properly deal with data larger
>>>> than
>>>>>>>>> about 2GB.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Shawn
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to