On Sat, Mar 8, 2025 at 3:48 PM Tim Allison <[email protected]> wrote:
> > I meant if it restarts on OOM (when detecting ERROR status) when not > using --no-fork -- double negation here, so to make it clear, when using > forked mode, does the watchdog monitor just the process (this is what I > remember from the code) or also uses a health check and restarts when > hitting ERROR? > > When running in standard mode (NOT --no-fork), the watchdog will restart > the forked process on ERROR. If it isn't doing that, that's a bug. > Thanks, good to know. > > >I probably don't have the complete logs, but I checked the code, ERROR > status is only set on caught OOM. > > If it is easy enough to check the logs somehow, it would be super helpful > to confirm you are getting logging for ooms. The world is a less happy > place if you're seeing a status of ERROR without anything in the logs. :D > I am pretty sure I had incomplete logs before and the ERROR status is due to OOM. Thanks, Cristi > > I still need to check through the code. :D > > On Thu, Mar 6, 2025 at 4:55 AM Cristian Zamfir <[email protected]> > wrote: > >> Hi Tim, thanks for your answer. >> On Wed, Mar 5, 2025 at 11:46 PM Tim Allison <[email protected]> wrote: >> >>> Sorry for my delay. >>> >>> OOMs should cause a restart. The jvm, as you said, is in an unstable >>> state. I frankly don’t know what that means practically, but every time I >>> google it, it feels that the consensus is that a shutdown is the right >>> answer. >>> >>> If you see ERROR, you should restart. >>> >> >> OK, I will do that. >> >> >>> >>> The watchdog does not restart on oom in nofork. >>> >> >> >> >> >>> >>> Part of my delay is that I can’t explain how you’re seeing ERROR but >>> nothing in the logs. OOMs, if catchable, are caught, logged and used to >>> update the status. I don’t see how you can get that status without logging. >>> >> >> I probably don't have the complete logs, but I checked the code, ERROR >> status is only set on caught OOM. >> >> >>> >>> I still need to look back through the code with nofork in mind. I made >>> the wrong assumption in my first look at the code. >>> >> >> Thanks, >> Cristi >> >> >>> >>> On Wed, Mar 5, 2025 at 4:28 PM Cristian Zamfir <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I wanted to summarize my questions: >>>> 1. Should I consider java.lang.OutOfMemoryError: Java heap space >>>> critical or is it a recoverable error? >>>> 2. If the /status endpoint reports ERROR, should the watchdog restart >>>> the server? >>>> 3. Does the Tika watchdog (when not running with --no-fork) restart the >>>> forked process on ERROR status? >>>> >>>> Thanks, >>>> Cristi >>>> >>>> On Tue, Mar 4, 2025 at 6:23 PM Cristian Zamfir <[email protected]> >>>> wrote: >>>> >>>>> Hi Tim, >>>>> >>>>> Thanks for your answer! >>>>> >>>>> >>>>> >>>>> On Tue, Mar 4, 2025 at 5:44 PM Tim Allison <[email protected]> >>>>> wrote: >>>>> >>>>>> I'm deeply puzzled. I agree with your assessments. >>>>>> 1) ERROR should only be a status if there was an OOM, and you should >>>>>> be seeing that elsewhere in your logs. Further, the chances that you'd >>>>>> see >>>>>> an ERROR should be fairly slim... that status should trigger a restart >>>>>> fairly quickly, but it is definitely possible to see that. >>>>>> >>>>> >>>>> So when running in forked mode, the watchdog process would query the >>>>> ERROR status and would terminate the process? >>>>> >>>>> What happens when OutOfMemory but the server continues to run, does >>>>> the JVM reclaim the heap and continue to run? Or is it running in an >>>>> undefined state? I can see it is working and can recover from this state, >>>>> but maybe there are some gotchas ... >>>>> >>>>> >>>>>> 2) The "SEVERE" warning level is chosen by cxf, and out of Tika's >>>>>> control. I've seen that before when the client closes the connection >>>>>> before >>>>>> reading all the data...I think. >>>>>> >>>>> >>>>> OK, then in this case it is not determining the ERROR state. >>>>> >>>>> >>>>>> >>>>>> Questions/assumptions: >>>>>> 1) tika 3.1.0? >>>>>> >>>>> Yes. >>>>> >>>>>> 2) you are running in default mode, you aren't running in {{nofork}} >>>>>> >>>>> >>>>> Running with --no-fork and a custom watchdog. However the watchdog >>>>> just takes care of starting a new instance, it does not check the health >>>>> status is OPERATING, just checking the http code from the /status >>>>> endpoint. >>>>> >>>>> >>>>>> 3) what are the other error entries?! >>>>>> >>>>> >>>>> Only this one, that I am debugging >>>>> - "package":"org.apache.pdfbox.contentstream.PDFStreamEngine", >>>>> "message":"Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image >>>>> I/O Tools are not installed"} >>>>> But normally there could be ERRORs reported for instance when parsing >>>>> encrypted docs, etc. I just wanted to double check that such errors do not >>>>> impact the status of the service. >>>>> >>>>> >>>>>> >>>>>> On the larger question, when you're running tika-server 2.x and >>>>>> greater, it should restart on its own (unless you're running in >>>>>> {{nofork}}. >>>>>> You shouldn't have to have a watcher to restart the processes. If you do >>>>>> want to take over that responsibility, you should run in {{nofork}} mode, >>>>>> maybe? >>>>>> >>>>> Indeed, running in no-fork mode and taking the responsibility of >>>>> restarting. Generally one can rely on k8s and health probes for restarts. >>>>> So my take-away is that health status should check that STATUS is not >>>>> ERROR, most likely, depending on your answer to the question above. >>>>> >>>>> Thanks, >>>>> Cristi >>>>> >>>>> >>>>>> >>>>>> On Tue, Mar 4, 2025 at 9:46 AM Cristian Zamfir <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> What is the meaning of the status ERROR in tika server? I noticed >>>>>>> that some operational servers respond to ERROR instead of OPERATING, >>>>>>> e.g., >>>>>>> { "server_id" : "2c38a628-a37d-401f-99cd-f22d933e60c1", "status" : >>>>>>> "ERROR", "millis_since_last_parse_started" : 24072, "files_processed" >>>>>>> : 9003, "num_restarts" : 0 } >>>>>>> >>>>>>> In the code it looks like ERROR is only set in OOM situations, >>>>>>> though I do not see this in the logs. >>>>>>> I see some ERROR entries that do not look like they should influence >>>>>>> the status of the server + this SEVERE entry: >>>>>>> >>>>>>> SEVERE: Problem with writing the data, class >>>>>>> org.apache.tika.server.core.resource.TikaResource$$Lambda/0x0000788572302f00, >>>>>>> ContentType: text/plain >>>>>>> Mar 04, 2025 11:34:52 AM org.apache.cxf.phase.PhaseInterceptorChain >>>>>>> doDefaultLogging >>>>>>> WARNING: Interceptor for { >>>>>>> http://resource.core.server.tika.apache.org/}TikaResource has >>>>>>> thrown exception, unwinding now >>>>>>> org.apache.cxf.interceptor.Fault: Could not send Message. >>>>>>> at >>>>>>> org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67) >>>>>>> at >>>>>>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) >>>>>>> at >>>>>>> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90) >>>>>>> at >>>>>>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) >>>>>>> at >>>>>>> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) >>>>>>> at >>>>>>> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) >>>>>>> at >>>>>>> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:244) >>>>>>> at >>>>>>> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:80) >>>>>>> at >>>>>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) >>>>>>> at >>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223) >>>>>>> at >>>>>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381) >>>>>>> at >>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:178) >>>>>>> at >>>>>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303) >>>>>>> at >>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) >>>>>>> at >>>>>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) >>>>>>> at >>>>>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) >>>>>>> at org.eclipse.jetty.server.Server.handle(Server.java:563) >>>>>>> at >>>>>>> org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598) >>>>>>> at >>>>>>> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753) >>>>>>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501) >>>>>>> at >>>>>>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287) >>>>>>> at >>>>>>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314) >>>>>>> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) >>>>>>> at >>>>>>> org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) >>>>>>> at >>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421) >>>>>>> at >>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390) >>>>>>> at >>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277) >>>>>>> at >>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199) >>>>>>> at >>>>>>> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411) >>>>>>> at >>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969) >>>>>>> at >>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194) >>>>>>> at >>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149) >>>>>>> >>>>>>> >>>>>>> Please let me know if any of this would be setting the status of the >>>>>>> server to ERROR. My goal was to look for OPERATING status as a health >>>>>>> indication and restart in case of ERROR, but I would like to avoid false >>>>>>> positives. >>>>>>> >>>>>>> Thanks, >>>>>>> Cristi >>>>>>> >>>>>>>
