Re: meaning of server status ERROR

Tim Allison Sat, 08 Mar 2025 06:48:23 -0800

> I meant if it restarts on OOM (when detecting ERROR status) when not
using --no-fork -- double negation here, so to make it clear, when using
forked mode, does the watchdog monitor just the process (this is what I
remember from the code) or also uses a health check and restarts when
hitting ERROR?


When running in standard mode (NOT --no-fork), the watchdog will restart
the forked process on ERROR. If it isn't doing that, that's a bug.

>I probably don't have the complete logs, but I checked the code, ERROR
status is only set on caught OOM.

 If it is easy enough to check the logs somehow, it would be super helpful
to confirm you are getting logging for ooms. The world is a less happy
place if you're seeing a status of ERROR without anything in the logs. :D

I still need to check through the code. :D

On Thu, Mar 6, 2025 at 4:55 AM Cristian Zamfir <cri...@cyberhaven.com>
wrote:

> Hi Tim, thanks for your answer.
> On Wed, Mar 5, 2025 at 11:46 PM Tim Allison <talli...@apache.org> wrote:
>
>> Sorry for my delay.
>>
>> OOMs should cause a restart. The jvm, as you said, is in an unstable
>> state. I frankly don’t know what that means practically, but every time I
>> google it, it feels that the consensus is that a shutdown is the right
>> answer.
>>
>> If you see ERROR, you should restart.
>>
>
> OK, I will do that.
>
>
>>
>> The watchdog does not restart on oom in nofork.
>>
>
>
>
>
>>
>> Part of my delay is that I can’t explain how you’re seeing ERROR but
>> nothing in the logs. OOMs, if catchable, are caught, logged and used to
>> update the status. I don’t see how you can get that status without logging.
>>
>
> I probably don't have the complete logs, but I checked the code, ERROR
> status is only set on caught OOM.
>
>
>>
>> I still need to look back through the code with nofork in mind. I made
>> the wrong assumption in my first look at the code.
>>
>
> Thanks,
> Cristi
>
>
>>
>> On Wed, Mar 5, 2025 at 4:28 PM Cristian Zamfir <cri...@cyberhaven.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I wanted to summarize my questions:
>>> 1. Should I consider java.lang.OutOfMemoryError: Java heap space
>>> critical or is it a recoverable error?
>>> 2. If the /status endpoint reports ERROR, should the watchdog restart
>>> the server?
>>> 3. Does the Tika watchdog (when not running with --no-fork) restart the
>>> forked process on ERROR status?
>>>
>>> Thanks,
>>> Cristi
>>>
>>> On Tue, Mar 4, 2025 at 6:23 PM Cristian Zamfir <cri...@cyberhaven.com>
>>> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> Thanks for your answer!
>>>>
>>>>
>>>>
>>>> On Tue, Mar 4, 2025 at 5:44 PM Tim Allison <talli...@apache.org> wrote:
>>>>
>>>>> I'm deeply puzzled. I agree with your assessments.
>>>>> 1) ERROR should only be a status if there was an OOM, and you should
>>>>> be seeing that elsewhere in your logs. Further, the chances that you'd see
>>>>> an ERROR should be fairly slim... that status should trigger a restart
>>>>> fairly quickly, but it is definitely possible to see that.
>>>>>
>>>>
>>>> So when running in forked mode, the watchdog process would query the
>>>> ERROR status and would terminate the process?
>>>>
>>>> What happens when OutOfMemory but the server continues to run, does the
>>>> JVM reclaim the heap and continue to run? Or is it running in an undefined
>>>> state? I can see it is working and can recover from this state, but maybe
>>>> there are some gotchas ...
>>>>
>>>>
>>>>> 2) The "SEVERE" warning level is chosen by cxf, and out of Tika's
>>>>> control. I've seen that before when the client closes the connection 
>>>>> before
>>>>> reading all the data...I think.
>>>>>
>>>>
>>>> OK, then in this case it is not determining the ERROR state.
>>>>
>>>>
>>>>>
>>>>> Questions/assumptions:
>>>>> 1) tika 3.1.0?
>>>>>
>>>> Yes.
>>>>
>>>>> 2) you are running in default mode, you aren't running in {{nofork}}
>>>>>
>>>>
>>>> Running with --no-fork and a custom watchdog. However the watchdog just
>>>> takes care of starting a new instance, it does not check the health status
>>>> is OPERATING, just checking the http code from the /status endpoint.
>>>>
>>>>
>>>>> 3) what are the other error entries?!
>>>>>
>>>>
>>>> Only this one, that I am debugging
>>>> - "package":"org.apache.pdfbox.contentstream.PDFStreamEngine",
>>>> "message":"Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image
>>>> I/O Tools are not installed"}
>>>> But normally there could be ERRORs reported for instance when parsing
>>>> encrypted docs, etc. I just wanted to double check that such errors do not
>>>> impact the status of the service.
>>>>
>>>>
>>>>>
>>>>> On the larger question, when you're running tika-server 2.x and
>>>>> greater, it should restart on its own (unless you're running in 
>>>>> {{nofork}}.
>>>>> You shouldn't have to have a watcher to restart the processes. If you do
>>>>> want to take over that responsibility, you should run in {{nofork}} mode,
>>>>> maybe?
>>>>>
>>>> Indeed, running in no-fork mode and taking the responsibility of
>>>> restarting. Generally one can rely on k8s and health probes for restarts.
>>>> So my take-away is that health status should check that STATUS is not
>>>> ERROR, most likely, depending on your answer to the question above.
>>>>
>>>> Thanks,
>>>> Cristi
>>>>
>>>>
>>>>>
>>>>> On Tue, Mar 4, 2025 at 9:46 AM Cristian Zamfir <cri...@cyberhaven.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> What is the meaning of the status ERROR in tika server? I noticed
>>>>>> that some operational servers respond to ERROR instead of OPERATING, 
>>>>>> e.g.,
>>>>>> { "server_id" : "2c38a628-a37d-401f-99cd-f22d933e60c1", "status" :
>>>>>> "ERROR", "millis_since_last_parse_started" : 24072, "files_processed"
>>>>>> : 9003, "num_restarts" : 0 }
>>>>>>
>>>>>> In the code it looks like ERROR is only set in OOM situations, though
>>>>>> I do not see this in the logs.
>>>>>> I see some ERROR entries that do not look like they should influence
>>>>>> the status of the server + this SEVERE entry:
>>>>>>
>>>>>> SEVERE: Problem with writing the data, class
>>>>>> org.apache.tika.server.core.resource.TikaResource$$Lambda/0x0000788572302f00,
>>>>>> ContentType: text/plain
>>>>>> Mar 04, 2025 11:34:52 AM org.apache.cxf.phase.PhaseInterceptorChain
>>>>>> doDefaultLogging
>>>>>> WARNING: Interceptor for {
>>>>>> http://resource.core.server.tika.apache.org/}TikaResource has thrown
>>>>>> exception, unwinding now
>>>>>> org.apache.cxf.interceptor.Fault: Could not send Message.
>>>>>> at
>>>>>> org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
>>>>>> at
>>>>>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>>>>>> at
>>>>>> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
>>>>>> at
>>>>>> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>>>>>> at
>>>>>> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
>>>>>> at
>>>>>> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
>>>>>> at
>>>>>> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:244)
>>>>>> at
>>>>>> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:80)
>>>>>> at
>>>>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>>>>>> at
>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
>>>>>> at
>>>>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381)
>>>>>> at
>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:178)
>>>>>> at
>>>>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303)
>>>>>> at
>>>>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
>>>>>> at
>>>>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149)
>>>>>> at
>>>>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
>>>>>> at org.eclipse.jetty.server.Server.handle(Server.java:563)
>>>>>> at
>>>>>> org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
>>>>>> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
>>>>>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
>>>>>> at
>>>>>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287)
>>>>>> at
>>>>>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
>>>>>> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
>>>>>> at
>>>>>> org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
>>>>>> at
>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421)
>>>>>> at
>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390)
>>>>>> at
>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277)
>>>>>> at
>>>>>> org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199)
>>>>>> at
>>>>>> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411)
>>>>>> at
>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
>>>>>> at
>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
>>>>>> at
>>>>>> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
>>>>>>
>>>>>>
>>>>>> Please let me know if any of this would be setting the status of the
>>>>>> server to ERROR. My goal was to look for OPERATING status as a health
>>>>>> indication and restart in case of ERROR, but I would like to avoid false
>>>>>> positives.
>>>>>>
>>>>>> Thanks,
>>>>>> Cristi
>>>>>>
>>>>>>

Re: meaning of server status ERROR

Reply via email to