Re: feedback on Solr 4.x LotsOfCores feature

Erick Erickson Sat, 19 Oct 2013 02:58:38 -0700

For my quick-and-dirty test I just rebooted my machine totally and still
had 1K/sec core discovery. So this still puzzles me greatly. The time
do do this should be approximated by the time it takes to just walk
your tree, find all the core.properties and read them. I it possible to
just write a tiny Java program to do that? Or rip off the core discovery
code and use that for a small stand-alone program? Because this is quite
a bit at odds with what I've seen. Although now that I think about it,
the code has gone through some revisions since then, but I don't think
they should have affected this...


Best
Erick


On Fri, Oct 18, 2013 at 2:59 PM, Soyez Olivier
<olivier.so...@worldline.com>wrote:

> 15K cores is around 4 minutes : no network drive, just a spinning disk
> But, one important thing, to simulate a cold start or an useless linux
> buffer cache,
> I used the following command to empty the linux buffer cache :
> sync && echo 3 > /proc/sys/vm/drop_caches
> Then, I started Solr and I found the result above
>
>
> Le 11/10/2013 13:06, Erick Erickson a écrit :
>
>
> bq: sharing the underlying solrconfig object the configset introduced
> in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode
>
> SOLR-4478 will NOT share the underlying config objects, it simply
> shares the underlying directory. Each core will, at least as presently
> envisioned, simply read the files that exist there and create their
> own solrconfig object. Schema objects may be shared, but not config
> objects. It may turn out to be relatively easy to do in the configset
> situation, but last time I looked at sharing the underlying config
> object it was too fraught with problems.
>
> bq: 15K cores is around 4 minutes
>
> I find this very odd. On my laptop, spinning disk, I think I was
> seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
> have no idea what's going on here. If this is just reading the files,
> you should be seeing horrible disk contention. Are you on some kind of
> networked drive?
>
> bq: To do that in background and to block on that request until core
> discovery is complete, should not work for us (due to the worst case).
> What other choices are there? Either you have to do it up front or
> with some kind of blocking. Hmmm, I suppose you could keep some kind
> of custom store (DB? File? ZooKeeper?) that would keep the last known
> layout. You'd still have some kind of worst-case situation where the
> core you were trying to load wouldn't be in your persistent store and
> you'd _still_ have to wait for the discovery process to complete.
>
> bq: and we will use the cores Auto option to create load or only load
> the core on
> Interesting. I can see how this could all work without any core
> discovery but it does require a very specific setup.
>
> On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
> <olivier.so...@worldline.com><mailto:olivier.so...@worldline.com> wrote:
> > The corresponding patch for Solr 4.2.1 LotsOfCores can be found in
> SOLR-5316, including the new Cores options :
> > - "numBuckets" to create a subdirectory based on a hash on the corename
> % numBuckets in the core Datadir
> > - "Auto" with 3 differents values :
> >   1) false : default behaviour
> >   2) createLoad : create, if not exist, and load the core on the fly on
> the first incoming request (update, select)
> >   3) onlyLoad : load the core on the fly on the first incoming request
> (update, select), if exist on disk
> >
> > Concerning :
> > - sharing the underlying solrconfig object, the configset introduced in
> JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode.
> > We need to test it for our use case. If another solution exists, please
> tell me. We are very interested in such functionality and to contribute, if
> we can.
> >
> > - the possibility of lotsOfCores in SolrCloud, we don't know in details
> how SolrCloud is working.
> > But one possible limit is the maximum number of entries that can be
> added to a zookeeper node.
> > Maybe, a solution will be just a kind of hashing in the zookeeper tree.
> >
> > - the time to discover cores in Solr 4.4 : with spinning disk under
> linux, all cores with transient="true" and loadOnStartup="false", the linux
> buffer cache empty before starting Solr :
> > 15K cores is around 4 minutes. It's linear in the cores number, so for
> 50K it's more than 13 minutes. In fact, it corresponding to the time to
> read all core.properties files.
> > To do that in background and to block on that request until core
> discovery is complete, should not work for us (due to the worst case).
> > So, we will just disable the core Discovery, because we don't need to
> know all cores from the start. Start Solr without any core entries in
> solr.xml, and we will use the cores Auto option to create load or only load
> the core on the fly, based on the existence of the core on the disk
> (absolute path calculated from the core name).
> >
> > Thanks for your interest,
> >
> > Olivier
> > ________________________________________
> > De : Erick Erickson [erickerick...@gmail.com<mailto:
> erickerick...@gmail.com>]
> > Date d'envoi : lundi 7 octobre 2013 14:33
> > À : solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> > Objet : Re: feedback on Solr 4.x LotsOfCores feature
> >
> > Thanks for the great writeup! It's always interesting to see how
> > a feature plays out "in the real world". A couple of questions
> > though:
> >
> > bq: We added 2 Cores options :
> > Do you mean you patched Solr? If so are you willing to shard the code
> > back? If both are "yes", please open a JIRA, attach the patch and assign
> > it to me.
> >
> > bq:  the number of file descriptors, it used a lot (need to increase
> global
> > max and per process fd)
> >
> > Right, this makes sense since you have a bunch of cores all with their
> > own descriptors open. I'm assuming that you hit a rather high max
> > number and it stays pretty steady....
> >
> > bq: the overhead to parse solrconfig.xml and load dependencies to open
> > each core
> >
> > Right, I tried to look at sharing the underlying solrconfig object but
> > it seemed pretty hairy. There are some extensive comments in the
> > JIRA of the problems I foresaw. There may be some action on this
> > in the future.
> >
> > bq: lotsOfCores doesn’t work with SolrCloud
> >
> > Right, we haven't concentrated on that, it's an interesting problem.
> > In particular it's not clear what happens when nodes go up/down,
> > replicate, resynch, all that.
> >
> > bq: When you start, it spend a lot of times to discover cores due to a
> big
> >
> > How long? I tried 15K cores on my laptop and I think I was getting 15
> > second delays or roughly 1K cores discovered/second. Is your delay
> > on the order of 50 seconds with 50K cores?
> >
> > I'm not sure how you could do that in the background, but I haven't
> > thought about it much. I tried multi-threading core discovery and that
> > didn't help (SSD disk), I assumed that the problem was mostly I/O
> > contention (but didn't prove it). What if a request came in for a core
> > before you'd found it? I'm not sure what the right behavior would be
> > except perhaps to block on that request until core discovery was
> > complete. Hmmmmm. How would that work for your case? That
> > seems do-able.
> >
> > BTW, so far you get the prize for the most cores on a node I think.
> >
> > Thanks again for the great feedback!
> >
> > Erick
> >
> > On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier
> > <olivier.so...@worldline.com><mailto:olivier.so...@worldline.com> wrote:
> >> Hello,
> >>
> >> In my company, we use Solr in production to offer full text search on
> >> mailboxes.
> >> We host dozens million of mailboxes, but only webmail users have such
> >> feature (few millions).
> >> We have the following use case :
> >> - non static indexes with more update (indexing and deleting), than
> >> select requests (ratio 7:1)
> >> - homogeneous configuration for all indexes
> >> - not so much user at the same time
> >>
> >> We started to index mailboxes with Solr 1.4 in 2010, on a subset of
> >> 400,000 users.
> >> - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
> >> instance
> >> - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
> >> index (~2 million users)
> >> - we upgraded to Solr 3.5 in 2012
> >> As indexes grew, IOPS and the response times have increased more and
> more.
> >>
> >> The index size was mainly due to stored fields (large .fdt files)
> >> Retrieving these fields from the index was costly, because of many seek
> >> in large files, and no limit usage possible.
> >> There is also an overhead on queries : too many results are filtered to
> >> find only results concerning user.
> >> For these reason and others, like not pooled users, hardware savings,
> >> better scoring, some requests that do not support filtering, we have
> >> decided to use the LotsOfCores feature.
> >>
> >> Our goal was to change the current I/O usage : from lots of random I/O
> >> access on huge segments to mostly sequential I/O access on small
> segments.
> >> For our use case, it's not a big deal, that the first query to one not
> >> yet loaded core will be slow.
> >> And, we don’t need to fit all the cores into memory at once.
> >>
> >> We started from the SOLR-1293 issue and the LotsOfCores wiki page to
> >> finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1
> >> core).
> >> We don't need anymore to run so many Solr per node. We are now able to
> >> have around 50000 cores per Solr and we plan to grow to 100,000 cores
> >> per instance.
> >> In a first time, we used the solr.xml persistence. All cores have
> >> loadOnStartup="false" and transient="true" attributes, so a cold start
> >> is very quick. The response times were better than ever, in comparaison
> >> with poor response times, we had before using LotsOfCores.
> >>
> >> We added 2 Cores options :
> >> - "numBuckets" to create a subdirectory based on a hash on the corename
> >> % numBuckets in the core Datadir, because all cores cannot live in the
> >> same directory
> >> - "Auto" with 3 differents values :
> >> 1) false : default behaviour
> >> 2) createLoad : create, if not exist, and load the core on the fly on
> >> the first incoming request (update, select).
> >> 3) onlyLoad : load the core on the fly on the first incoming request
> >> (update, select), if exist on disk
> >>
> >> Then, to improve performance and avoid synchronization in the solr.xml
> >> persistence : we disabled it.
> >> The drawback is we cannot see anymore all the availables cores list with
> >> the admin core status command, only those warmed up.
> >> Finally, we can achieve very good performances with Solr LotsOfCores :
> >> - Index 5 emails (avg) + commit + search : x4.9 faster response time
> >> (Mean), x5.4 faster (95th per)
> >> - Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4
> >> faster (95th per)
> >> - Search : x3.7 faster response time (Mean) 4x faster (95th per)
> >>
> >> In fact, the better performance is mainly due to the little size of each
> >> index, but also thanks to the isolation between cores (updates and
> >> queries on many mailboxes don’t have side effects to each other).
> >> One important thing with the LotsOfCores feature is to take care of :
> >> - the number of file descriptors, it used a lot (need to increase global
> >> max and per process fd)
> >> - the value of the transientCacheSize depending of the RAM size and the
> >> PermGen allocated size
> >> - the leak of ClassLoader that increase minor GC times, when CMS GC is
> >> enabled (use -XX:+CMSClassUnloadingEnabled)
> >> - the overhead to parse solrconfig.xml and load dependencies to open
> >> each core
> >> - lotsOfCores doesn’t work with SolrCloud, then we store indexes
> >> location outside of Solr. We have Solr proxies to route requests to the
> >> right instance.
> >>
> >> Not in production, we try the core discovery feature in Solr 4.4 with a
> >> lots of cores.
> >> When you start, it spend a lot of times to discover cores due to a big
> >> number of cores, meanwhile all requests fail (SolrDispatchFilter.init()
> >> not done yet). It will be great to have for example an option for a core
> >> discovery in background, or just to be able to disable it, like we do in
> >> our use case.
> >>
> >> If someone is interested in these new options for LotsOfCores feature,
> >> just tell me
> >
> >
> > Ce message et les pièces jointes sont confidentiels et réservés à
> l'usage exclusif de ses destinataires. Il peut également être protégé par
> le secret professionnel. Si vous recevez ce message par erreur, merci d'en
> avertir immédiatement l'expéditeur et de le détruire. L'intégrité du
> message ne pouvant être assurée sur Internet, la responsabilité de
> Worldline ne pourra être recherchée quant au contenu de ce message. Bien
> que les meilleurs efforts soient faits pour maintenir cette transmission
> exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et
> sa responsabilité ne saurait être recherchée pour tout dommage résultant
> d'un virus transmis.
> >
> > This e-mail and the documents attached are confidential and intended
> solely for the addressee; it may also be privileged. If you receive this
> e-mail in error, please notify the sender immediately and destroy it. As
> its integrity cannot be secured on the Internet, the Worldline liability
> cannot be triggered for the message content. Although the sender endeavours
> to maintain a computer virus-free network, the sender does not warrant that
> this transmission is virus-free and will not be liable for any damages
> resulting from any virus transmitted.
>
>
> ________________________________
>
> Ce message et les pièces jointes sont confidentiels et réservés à l'usage
> exclusif de ses destinataires. Il peut également être protégé par le secret
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra
> être recherchée quant au contenu de ce message. Bien que les meilleurs
> efforts soient faits pour maintenir cette transmission exempte de tout
> virus, l'expéditeur ne donne aucune garantie à cet égard et sa
> responsabilité ne saurait être recherchée pour tout dommage résultant d'un
> virus transmis.
>
> This e-mail and the documents attached are confidential and intended
> solely for the addressee; it may also be privileged. If you receive this
> e-mail in error, please notify the sender immediately and destroy it. As
> its integrity cannot be secured on the Internet, the Worldline liability
> cannot be triggered for the message content. Although the sender endeavours
> to maintain a computer virus-free network, the sender does not warrant that
> this transmission is virus-free and will not be liable for any damages
> resulting from any virus transmitted.
>

Re: feedback on Solr 4.x LotsOfCores feature

Reply via email to