Thanks Dominic. I'm guessing that something in the replication
invalidates cacheing of these files, and once they're in memory again
everything is fine, although I don't know how this might have changed.
I found this interesting snippet about ExternalFileField performance
being improved by sorting that might be related, but then again it's
pretty old.
https://stackoverflow.com/questions/29470458/solr-external-file-field-performance-issue
. I also note that my ex-colleague Alan did improve EFF performance a
while ago https://issues.apache.org/jira/browse/SOLR-3985 . Everything
I've read including from Issuu
https://engineering.issuu.com/2013/03/11/how-search-at-issuu-actually-works
implies that EFF isn't particularly performant anyway. There doesn't
seem to have been any activity around EFF between those versions apart
from some doc fixes
https://issues.apache.org/jira/browse/SOLR-14968?jql=text%20~%20%22externalfilefield%22
Hope some of these links help you track down the problem!
Best
Charlie
On 26/10/2021 15:31, Dominic Humphries wrote:
No problem, I've been trying to get my head around how it all works myself!
As per
https://solr.apache.org/guide/8_9/working-with-external-files-and-processes.html
our schema defines a field type:
<fieldType name="fileboost" keyField="id" defVal="1" stored="false"
indexed="false" class="solr.ExternalFileField"/>
which is then used to define a field:
<field name="boostvalue" type="fileboost"/>
which pulls data from a file, external_boostvalue, living in $SOLR_HOME/data
This is used to set a boost value that increases the visibility of some
search results.
Setting this file to be empty completely removes the performance hit we see
taking several minutes to resolve after each replication. But we do need
the functionality still, and I'm unclear on why this is an issue for 8.9
when it wasn't for 8.3
Hope this clarifies the problem!
Dominic
On Mon, 25 Oct 2021 at 19:03, Charlie Hull <[email protected]>
wrote:
Hi Dominic,
Could you clarify what you mean by boost files in this context? Just
curious....
Charlie
On 25/10/2021 17:11, Dominic Humphries wrote:
Performance with the replica pulling from 8.3.1 was actually worse. And
looking at the data in the databases and the boost file contents, I'm
dubious it's a problem of incompatible boost files. I think the
performance
of importing/applying the boosts really is what's responsible for the
issue
we see. Not sure what else to test to verify or disprove this..
On Mon, 25 Oct 2021 at 14:56, Dominic Humphries <[email protected]>
wrote:
I think I found it!
I didn't realise, but we have boost files for the core I'm testing and
the
boost is applied after replication! Setting the contents of the files to
empty completely removes the post-replication performance problem we
were
seeing.
So now my question becomes "Why is boosting taking so much longer for
the
upgrade?"
Since the upgrade has its own independent set of data, I'm wondering if
it's as simple as the IDs it's trying to boost don't exist and it takes
longer to find out an item is missing than it does to find one that
does? I
believe I can point an 8.9.0 follower at an 8.3.1 leader, that seems
like
the next logical step - if there's no performance hit when it has the
same
data as the 8.3.1 replica, then that's almost certainly the problem.
Fingers crossed!
On Sun, 24 Oct 2021 at 10:26, Deepak Goel <[email protected]> wrote:
There could be some testing and cooling happening post-replication.
will
have to dig a bit more into the code.
Deepak
"The greatness of a nation can be judged by the way its animals are
treated
- Mahatma Gandhi"
+91 73500 12833
[email protected]
Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool
"Plant a Tree, Go Green"
Make In India : http://www.makeinindia.com/home
On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries
<[email protected]> wrote:
One more tidbit: I just tried leaving replication off for a few hours
and
then triggering a "big" replication run so I could see the distinct
stages.
- Beginning replication didn't cause any performance degradation.
- Several minutes of downloading the replication files saw no
degradation
- Only after downloading had completed did we start to see
performance
issues in our tests
- But we saw the "number of docs/timestamp of latest file" both
jump
almost immediately after downloading completed and never move
again
- But the performance degradation continued for about seven more
minutes
even though replication was clearly finished at this point
Is there some kind of re-indexing optimization thing that solr can run
post-replication? At this point it's about my only remaining suspect..
--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
<www.o19s.com>
Founding member of The Search Network <https://thesearchnetwork.com/>
and co-author of Searching the Enterprise
<https://opensourceconnections.com/about-us/books-resources/>
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II
--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
<www.o19s.com>
Founding member of The Search Network <https://thesearchnetwork.com/>
and co-author of Searching the Enterprise
<https://opensourceconnections.com/about-us/books-resources/>
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II