Good luck!

You have one huge advantage when doing prototyping, you can
mine your current logs for real user queries. It's actually
surprisingly difficult to generate, say, 10,000 "realistic" queries. And
IMO you need something approaching that number to insure that
you're queries don't hit the caches etc....

Anyway, sounds like you're off and running.

Best,
Erick

On Wed, Apr 27, 2016 at 10:12 AM, Stephen Lewis <sle...@panopto.com> wrote:
>>
> If I'm reading this right, you have 420M docs on a single shard?
> Yep, you were reading it right. Thanks for your guidance. We will do
> various prototyping following "the sizing exercise".
>
> Best,
> Stephen
>
> On Tue, Apr 26, 2016 at 6:17 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>>
>> If I'm reading this right, you have 420M docs on a single shard? If that's
>> true
>> you are pushing the envelope of what I've seen work and be performant. Your
>> OOM errors are the proverbial 'smoking gun' that you're putting too many
>> docs
>> on too few nodes.
>>
>> You say that the document count is "growing quite rapidly". My expectation
>> is
>> that your problems will only get worse as you cram more docs into your
>> shard.
>>
>> You're correct that adding more memory (and consequently more JVM
>> memory?) only gets you so far before you start running into GC trouble,
>> when you hit full GC pauses they'll get longer and longer which is its own
>> problem. And you don't want to have huge JVM memory at the expense
>> of op system memory due the fact that Lucene uses MMapDirectory, see
>> Uwe's excellent blog:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> I'd _strongly_ recommend you do "the sizing exercise". There are lots of
>> details here:
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> You've already done some of this inadvertently, unfortunately it sounds
>> like
>> it's in production. If I were going to guess, I'd say the maximum number of
>> docs on any shard should be less than half what you currently have. So you
>> need to figure out how many docs you expect to host in this collection
>> eventually
>> and have N/200M shards. At least.
>>
>> There are various strategies when the answer is "I don't know", you
>> might add new
>> collections when you max out and then use "collection aliasing" to
>> query them etc.
>>
>> Best,
>> Erick
>>
>> On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis <sle...@panopto.com> wrote:
>> > Hello,
>> >
>> > I'm looking for some guidance on the best steps for tuning a solr cloud
>> > cluster which is heavy on writes. We are currently running a solr cloud
>> > fleet composed of one core, one shard, and three nodes. The cloud is
>> hosted
>> > in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
>> > and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
>> > GiB over 420M documents, and growing quite rapidly. We are currently
>> doing
>> > a bit more than 1000 document writes/deletes per second.
>> >
>> > Recently, we've hit some trouble with our production cloud. We have had
>> the
>> > process on individual instances die a few times, and we see the following
>> > error messages being logged (expanded logs at the bottom of the email):
>> >
>> > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
>> > null:org.eclipse.jetty.io.EofException
>> >
>> > WARN  - 2016-04-26 00:55:29.571;
>> org.eclipse.jetty.servlet.ServletHandler;
>> > /solr/panopto/select
>> > java.lang.IllegalStateException: Committed
>> >
>> > WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
>> > Committed before 500 {trace=org.eclipse.jetty.io.EofException
>> >
>> >
>> > Another time we saw this happen, we had java OOM errors (expanded logs at
>> > the bottom):
>> >
>> > WARN  - 2016-04-25 22:58:43.943;
>> org.eclipse.jetty.servlet.ServletHandler;
>> > Error for /solr/panopto/select
>> > java.lang.OutOfMemoryError: Java heap space
>> > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
>> > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
>> space
>> > ...
>> > Caused by: java.lang.OutOfMemoryError: Java heap space
>> >
>> >
>> > When the cloud goes into recovery during live indexing, it takes about
>> 4-6
>> > hours for a node to recover, but when we turn off indexing, recovery only
>> > takes about 90 minutes.
>> >
>> > Moreover, we see that deletes are extremely slow. We do batch deletes of
>> > about 300 documents based on two value filters, and this takes about one
>> > minute:
>> >
>> > Research online suggests that a larger disk cache
>> > <https://wiki.apache.org/solr/SolrPerformanceProblems> could be helpful,
>> > but I also see from an older page
>> > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed> on tuning for
>> > Lucene that turning down the swappiness on our Linux instances may be
>> > preferred to simply increasing space for the disk cache.
>> >
>> > Moreover, to scale in the past, we've simply rolled our cluster while
>> > increasing the memory on the new machines, but I wonder if we're hitting
>> > the limit for how much we should scale vertically. My impression is that
>> > sharding will allow us to warm searchers faster and maintain a more
>> > effective cache as we scale. Will we really be helped by sharding, or is
>> it
>> > only a matter of total CPU/Memory in the cluster?
>> >
>> > Thanks!
>> >
>> > Stephen
>> >
>> > (206)753-9320
>> > stephen-lewis.net
>> >
>> > Logs:
>> >
>> > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
>> > null:org.eclipse.jetty.io.EofException
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
>> > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
>> > at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
>> > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
>> > at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
>> > at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
>> > at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
>> > at
>> >
>> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
>> > at
>> >
>> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> > at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> > at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> > at
>> >
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> > WARN  - 2016-04-25 22:58:43.943;
>> org.eclipse.jetty.servlet.ServletHandler;
>> > Error for /solr/panopto/select
>> > java.lang.OutOfMemoryError: Java heap space
>> > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
>> > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
>> space
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> > at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> > at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
>> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
>> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> > at
>> >
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> > at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.lang.OutOfMemoryError: Java heap space
>> >
>> > WARN  - 2016-04-26 00:56:43.873; org.eclipse.jetty.server.Response;
>> > Committed before 500 {trace=org.eclipse.jetty.io.EofException
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
>> > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
>> > at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
>> > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
>> > at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
>> > at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
>> > at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
>> > at
>> >
>> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
>> > at
>> >
>> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> > at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> > at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> > at
>> >
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> > at java.lang.Thread.run(Thread.java:745)
>> > ,code=500}
>>
>
>
>
> --
> Stephen
>
> (206)753-9320
> stephen-lewis.net

Reply via email to