At last, I think we've got it!

Our external boost files live on an NFS volume so they can be updated once
by a worker machine and all the followers will get the update. Which is all
very nice.

But if we instead source those files from the local filesystem instead of
one mounted from the network, the performance issue goes away!

I've tested this manually and it looks good; I'm now in the process of
updating our terraform etc so the instances will be able to use local
copies of these files. Assuming the update works, the matter will finally
be fixed!

So the reason we were seeing performance issues was that we were using
NFS-mounted external files to update our boosts - which is probably
edge-case enough to be why nobody else was reporting it!

I'll update one last time to confirm all is well with the new images, and
hopefully this issue can be put to bed at last.

Thanks all for your help!

Dominic

On Tue, 26 Oct 2021 at 15:31, Dominic Humphries <[email protected]> wrote:

> No problem, I've been trying to get my head around how it all works myself!
>
> As per
> https://solr.apache.org/guide/8_9/working-with-external-files-and-processes.html
> our schema defines a field type:
>     <fieldType name="fileboost" keyField="id" defVal="1" stored="false"
> indexed="false" class="solr.ExternalFileField"/>
> which is then used to define a field:
>     <field name="boostvalue" type="fileboost"/>
> which pulls data from a file, external_boostvalue, living
> in $SOLR_HOME/data
>
> This is used to set a boost value that increases the visibility of some
> search results.
>
> Setting this file to be empty completely removes the performance hit we
> see taking several minutes to resolve after each replication. But we do
> need the functionality still, and I'm unclear on why this is an issue for
> 8.9 when it wasn't for 8.3
>
> Hope this clarifies the problem!
>
> Dominic
>
> On Mon, 25 Oct 2021 at 19:03, Charlie Hull <
> [email protected]> wrote:
>
>> Hi Dominic,
>>
>> Could you clarify what you mean by boost files in this context? Just
>> curious....
>>
>> Charlie
>>
>> On 25/10/2021 17:11, Dominic Humphries wrote:
>> > Performance with the replica pulling from 8.3.1 was actually worse. And
>> > looking at the data in the databases and the boost file contents, I'm
>> > dubious it's a problem of incompatible boost files. I think the
>> performance
>> > of importing/applying the boosts really is what's responsible for the
>> issue
>> > we see. Not sure what else to test to verify or disprove this..
>> >
>> > On Mon, 25 Oct 2021 at 14:56, Dominic Humphries <[email protected]>
>> wrote:
>> >
>> >> I think I found it!
>> >>
>> >> I didn't realise, but we have boost files for the core I'm testing and
>> the
>> >> boost is applied after replication! Setting the contents of the files
>> to
>> >> empty completely removes the post-replication performance problem we
>> were
>> >> seeing.
>> >>
>> >> So now my question becomes "Why is boosting taking so much longer for
>> the
>> >> upgrade?"
>> >>
>> >> Since the upgrade has its own independent set of data, I'm wondering if
>> >> it's as simple as the IDs it's trying to boost don't exist and it takes
>> >> longer to find out an item is missing than it does to find one that
>> does? I
>> >> believe I can point an 8.9.0 follower at an 8.3.1 leader, that seems
>> like
>> >> the next logical step - if there's no performance hit when it has the
>> same
>> >> data as the 8.3.1 replica, then that's almost certainly the problem.
>> >>
>> >> Fingers crossed!
>> >>
>> >> On Sun, 24 Oct 2021 at 10:26, Deepak Goel <[email protected]> wrote:
>> >>
>> >>> There could be some testing and cooling happening post-replication.
>> will
>> >>> have to dig a bit more into the code.
>> >>>
>> >>> Deepak
>> >>> "The greatness of a nation can be judged by the way its animals are
>> >>> treated
>> >>> - Mahatma Gandhi"
>> >>>
>> >>> +91 73500 12833
>> >>> [email protected]
>> >>>
>> >>> Facebook: https://www.facebook.com/deicool
>> >>> LinkedIn: www.linkedin.com/in/deicool
>> >>>
>> >>> "Plant a Tree, Go Green"
>> >>>
>> >>> Make In India : http://www.makeinindia.com/home
>> >>>
>> >>>
>> >>> On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries
>> >>> <[email protected]> wrote:
>> >>>
>> >>>> One more tidbit: I just tried leaving replication off for a few hours
>> >>> and
>> >>>> then triggering a "big" replication run so I could see the distinct
>> >>> stages.
>> >>>>
>> >>>>     - Beginning replication didn't cause any performance degradation.
>> >>>>     - Several minutes of downloading the replication files saw no
>> >>>> degradation
>> >>>>     - Only after downloading had completed did we start to see
>> >>> performance
>> >>>>     issues in our tests
>> >>>>     - But we saw the "number of docs/timestamp of latest file" both
>> jump
>> >>>>     almost immediately after downloading completed and never move
>> again
>> >>>>     - But the performance degradation continued for about seven more
>> >>> minutes
>> >>>>     even though replication was clearly finished at this point
>> >>>>
>> >>>>
>> >>>> Is there some kind of re-indexing optimization thing that solr can
>> run
>> >>>> post-replication? At this point it's about my only remaining
>> suspect..
>> >>>>
>>
>> --
>> Charlie Hull - Managing Consultant at OpenSource Connections Limited
>> <www.o19s.com>
>> Founding member of The Search Network <https://thesearchnetwork.com/>
>> and co-author of Searching the Enterprise
>> <https://opensourceconnections.com/about-us/books-resources/>
>> tel/fax: +44 (0)8700 118334
>> mobile: +44 (0)7767 825828
>>
>> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
>> Amtsgericht Charlottenburg | HRB 230712 B
>> Geschäftsführer: John M. Woodell | David E. Pugh
>> Finanzamt: Berlin Finanzamt für Körperschaften II
>>
>

Reply via email to