On Sat, Mar 8, 2025 at 3:48 PM Tim Allison <[email protected]> wrote:

> > I meant if it restarts on OOM (when detecting ERROR status) when not
> using --no-fork -- double negation here, so to make it clear, when using
> forked mode, does the watchdog monitor just the process (this is what I
> remember from the code) or also uses a health check and restarts when
> hitting ERROR?
>
> When running in standard mode (NOT --no-fork), the watchdog will restart
> the forked process on ERROR. If it isn't doing that, that's a bug.
>

Thanks, good to know.


>
> >I probably don't have the complete logs, but I checked the code, ERROR
> status is only set on caught OOM.
>
>  If it is easy enough to check the logs somehow, it would be super helpful
> to confirm you are getting logging for ooms. The world is a less happy
> place if you're seeing a status of ERROR without anything in the logs. :D
>

I am pretty sure I had incomplete logs before and the ERROR status is due
to OOM.

Thanks,
Cristi


>
> I still need to check through the code. :D
>
> On Thu, Mar 6, 2025 at 4:55 AM Cristian Zamfir <[email protected]>
> wrote:
>
>> Hi Tim, thanks for your answer.
>> On Wed, Mar 5, 2025 at 11:46 PM Tim Allison <[email protected]> wrote:
>>
>>> Sorry for my delay.
>>>
>>> OOMs should cause a restart. The jvm, as you said, is in an unstable
>>> state. I frankly don’t know what that means practically, but every time I
>>> google it, it feels that the consensus is that a shutdown is the right
>>> answer.
>>>
>>> If you see ERROR, you should restart.
>>>
>>
>> OK, I will do that.
>>
>>
>>>
>>> The watchdog does not restart on oom in nofork.
>>>
>>
>>
>>
>>
>>>
>>> Part of my delay is that I can’t explain how you’re seeing ERROR but
>>> nothing in the logs. OOMs, if catchable, are caught, logged and used to
>>> update the status. I don’t see how you can get that status without logging.
>>>
>>
>> I probably don't have the complete logs, but I checked the code, ERROR
>> status is only set on caught OOM.
>>
>>
>>>
>>> I still need to look back through the code with nofork in mind. I made
>>> the wrong assumption in my first look at the code.
>>>
>>
>> Thanks,
>> Cristi
>>
>>
>>>
>>> On Wed, Mar 5, 2025 at 4:28 PM Cristian Zamfir <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I wanted to summarize my questions:
>>>> 1. Should I consider java.lang.OutOfMemoryError: Java heap space
>>>> critical or is it a recoverable error?
>>>> 2. If the /status endpoint reports ERROR, should the watchdog restart
>>>> the server?
>>>> 3. Does the Tika watchdog (when not running with --no-fork) restart the
>>>> forked process on ERROR status?
>>>>
>>>> Thanks,
>>>> Cristi
>>>>
>>>> On Tue, Mar 4, 2025 at 6:23 PM Cristian Zamfir <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Tim,
>>>>>
>>>>> Thanks for your answer!
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 4, 2025 at 5:44 PM Tim Allison <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I'm deeply puzzled. I agree with your assessments.
>>>>>> 1) ERROR should only be a status if there was an OOM, and you should
>>>>>> be seeing that elsewhere in your logs. Further, the chances that you'd 
>>>>>> see
>>>>>> an ERROR should be fairly slim... that status should trigger a restart
>>>>>> fairly quickly, but it is definitely possible to see that.
>>>>>>
>>>>>
>>>>> So when running in forked mode, the watchdog process would query the
>>>>> ERROR status and would terminate the process?
>>>>>
>>>>> What happens when OutOfMemory but the server continues to run, does
>>>>> the JVM reclaim the heap and continue to run? Or is it running in an
>>>>> undefined state? I can see it is working and can recover from this state,
>>>>> but maybe there are some gotchas ...
>>>>>
>>>>>
>>>>>> 2) The "SEVERE" warning level is chosen by cxf, and out of Tika's
>>>>>> control. I've seen that before when the client closes the connection 
>>>>>> before
>>>>>> reading all the data...I think.
>>>>>>
>>>>>
>>>>> OK, then in this case it is not determining the ERROR state.
>>>>>
>>>>>
>>>>>>
>>>>>> Questions/assumptions:
>>>>>> 1) tika 3.1.0?
>>>>>>
>>>>> Yes.
>>>>>
>>>>>> 2) you are running in default mode, you aren't running in {{nofork}}
>>>>>>
>>>>>
>>>>> Running with --no-fork and a custom watchdog. However the watchdog
>>>>> just takes care of starting a new instance, it does not check the health
>>>>> status is OPERATING, just checking the http code from the /status 
>>>>> endpoint.
>>>>>
>>>>>
>>>>>> 3) what are the other error entries?!
>>>>>>
>>>>>
>>>>> Only this one, that I am debugging
>>>>> - "package":"org.apache.pdfbox.contentstream.PDFStreamEngine",
>>>>> "message":"Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image
>>>>> I/O Tools are not installed"}
>>>>> But normally there could be ERRORs reported for instance when parsing
>>>>> encrypted docs, etc. I just wanted to double check that such errors do not
>>>>> impact the status of the service.
>>>>>
>>>>>
>>>>>>
>>>>>> On the larger question, when you're running tika-server 2.x and
>>>>>> greater, it should restart on its own (unless you're running in 
>>>>>> {{nofork}}.
>>>>>> You shouldn't have to have a watcher to restart the processes. If you do
>>>>>> want to take over that responsibility, you should run in {{nofork}} mode,
>>>>>> maybe?
>>>>>>
>>>>> Indeed, running in no-fork mode and taking the responsibility of
>>>>> restarting. Generally one can rely on k8s and health probes for restarts.
>>>>> So my take-away is that health status should check that STATUS is not
>>>>> ERROR, most likely, depending on your answer to the question above.
>>>>>
>>>>> Thanks,
>>>>> Cristi
>>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Mar 4, 2025 at 9:46 AM Cristian Zamfir <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> What is the meaning of the status ERROR in tika server? I noticed
>>>>>>> that some operational servers respond to ERROR instead of OPERATING, 
>>>>>>> e.g.,
>>>>>>> { "server_id" : "2c38a628-a37d-401f-99cd-f22d933e60c1", "status" :
>>>>>>> "ERROR", "millis_since_last_parse_started" : 24072, "files_processed"
>>>>>>> : 9003, "num_restarts" : 0 }
>>>>>>>
>>>>>>> In the code it looks like ERROR is only set in OOM situations,
>>>>>>> though I do not see this in the logs.
>>>>>>> I see some ERROR entries that do not look like they should influence
>>>>>>> the status of the server + this SEVERE entry:
>>>>>>>
>>>>>>> SEVERE: Problem with writing the data, class
>>>>>>> org.apache.tika.server.core.resource.TikaResource$$Lambda/0x0000788572302f00,
>>>>>>> ContentType: text/plain
>>>>>>> Mar 04, 2025 11:34:52 AM org.apache.cxf.phase.PhaseInterceptorChain
>>>>>>> doDefaultLogging
>>>>>>> WARNING: Interceptor for {
>>>>>>> http://resource.core.server.tika.apache.org/}TikaResource has
>>>>>>> thrown exception, unwinding now
>>>>>>> org.apache.cxf.interceptor.Fault: Could not send Message.
>>>>>>> at
>>>>>>> org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
>>>>>>> at
>>>>>>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>>>>>>> at
>>>>>>> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
>>>>>>> at
>>>>>>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>>>>>>> at
>>>>>>> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
>>>>>>> at
>>>>>>> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
>>>>>>> at
>>>>>>> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:244)
>>>>>>> at
>>>>>>> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:80)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:178)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>>>>>>> at org.eclipse.jetty.server.Server.handle(Server.java:563)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
>>>>>>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
>>>>>>> at
>>>>>>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287)
>>>>>>> at
>>>>>>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
>>>>>>> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
>>>>>>> at
>>>>>>> org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
>>>>>>> at
>>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421)
>>>>>>> at
>>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390)
>>>>>>> at
>>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277)
>>>>>>> at
>>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199)
>>>>>>> at
>>>>>>> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411)
>>>>>>> at
>>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
>>>>>>> at
>>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
>>>>>>> at
>>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
>>>>>>>
>>>>>>>
>>>>>>> Please let me know if any of this would be setting the status of the
>>>>>>> server to ERROR. My goal was to look for OPERATING status as a health
>>>>>>> indication and restart in case of ERROR, but I would like to avoid false
>>>>>>> positives.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Cristi
>>>>>>>
>>>>>>>

Reply via email to