No faceting. Highlighting. We have very long queries, because students are 
pasting homework problems. I’ve seen 1000 word queries, but we truncate at 40 
words. 

We do as-you-type results, so we also have ngram fields on the 20 million 
solved homework questions. This bloats the index severely. About 75% of terms 
are ngram.

Median query time is over one second, so a burst of traffic can back up a lot 
of work.

If we hard limit the amount of simultaneous requests, the cluster can get slow 
instead of falling over.

Thousands of connections is a lot better than thousands of threads. Connections 
are just blocks of data in the client and OS.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 29, 2017, at 3:41 PM, Toke Eskildsen <t...@kb.dk> wrote:
> 
> Walter Underwood <wun...@wunderwood.org> wrote:
>> I knew about SOLR-7433, but I’m really surprised that 200 incoming requests 
>> can need 4000 threads.
>> 
>> We have four shards.
> 
> For that I would have expected at most 800 Threads. Are you perhaps doing 
> faceting on multiple fields with facet.threads=5? (kinda grasping at straws 
> here)
> 
>> Why is there a thread per shard? HTTP can be done async: send1,
>> send2, send3, send4, recv1 recv2, recv3, recv4. I’ve been doing
>> that for over a decade with HTTPClient.
> 
> I don't know the reasoning. Should I design it from scratch, I would probably 
> still use Threads (wrapped as Futures) as they are easy to work with. Getting 
> into thousands of connections in Solr seems like a danger sigh to me, whether 
> they are done async or not.
> 
> - Toke Eskildsen

Reply via email to