Re: views failing due to fabric_worker_timeout and OS process timed out

Adam Kocoloski Tue, 21 Feb 2017 17:51:13 -0800

Hi Gustavo, there are a couple of things going on here. Let’s address them 
individually:


> On Feb 21, 2017, at 6:17 PM, Gustavo Delfino <[email protected]> wrote:
> 
> Hi, I am evaluating using CouchDB and all worked well with a small test 
> database. Now I am trying to use it with a much larger database and I am 
> having an issue creating views. My view map function is very simple:
> 
> function (doc) {
>    var trw_id;
>    if(doc.customer_id){
>      emit(doc.customer_id, doc._id);
>    }
> }
> 
> With a few hundred documents it works well but not as the size of the db 
> grows (or maybe I have an issue with the function above).
> 
> I can see in the log how the shards start working:
> 
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29209.6> -------- 
> Starting index update for db: shards/20000000-3fffffff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29194.6> -------- 
> Starting index update for db: shards/00000000-1fffffff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29191.6> -------- 
> Starting index update for db: shards/60000000-7fffffff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29205.6> -------- 
> Starting index update for db: shards/80000000-9fffffff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29218.6> -------- 
> Starting index update for db: shards/40000000-5fffffff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29228.6> -------- 
> Starting index update for db: shards/a0000000-bfffffff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29225.6> -------- 
> Starting index update for db: shards/c0000000-dfffffff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.788000Z couchdb@localhost <0.29208.6> -------- 
> Starting index update for db: shards/e0000000-ffffffff/vw.1487715840 idx: 
> _design/appname
> 
> I see high CPU activity signaling that the index is being created and 
> suddenly it stops:
> 
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/00000000-1fffffff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/20000000-3fffffff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/40000000-5fffffff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/60000000-7fffffff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/80000000-9fffffff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/a0000000-bfffffff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/c0000000-dfffffff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/e0000000-ffffffff/vw.1487715840”>>

These timeouts are the expected behavior in 2.0 when a request for a view hits 
a configurable limit. The default timeout is 60 seconds. I believe this may 
have been a change from 1.x where the socket would sit open as long as 
necessary. If you need to recover that behavior you can set

[fabric]
request_timeout = infinity

You could also configure some other number in milliseconds:

; Give up after 10 seconds
[fabric]
request_timeout = 10000 

In any case the indexing jobs should have continued even after this timeout. I 
think that’s why the request worked when you reloaded the page.

> [error] 2017-02-21T22:39:59.000000Z couchdb@localhost <0.19734.6> d4985a33d1 
> req_err(1329706011) unknown_error : function_clause
>    [<<"couch_mrview_show:list_cb/2 L212">>,<<"fabric_view_map:go/7 
> L52">>,<<"couch_query_servers:with_ddoc_proc/2 
> L421">>,<<"chttpd:process_request/1 L293">>,<<"chttpd:handle_request_int/1 
> L229">>,<<"mochiweb_http:headers/6 L122">>,<<"proc_lib:init_p_do_apply/3 
> L237">>]
> [notice] 2017-02-21T22:39:59.002000Z couchdb@localhost <0.19734.6> d4985a33d1 
> 127.0.0.1:5984 127.0.0.1 undefined GET 
> /dbname/_design/appname/_list/data/customer_id?key=%22PRIV-SE270_FC_AZT10L16_016%22
>  500 ok 60218

Here the “60218” number is the response time in milliseconds, which confirms 
that you bumped into the default timeout. However, you should have gotten 
something more informative than this function_clause error response. That’s a 
bug in our error handling; if you like I’d encourage you to file a bug report:

https://issues.apache.org/jira/browse/COUCHDB 
<https://issues.apache.org/jira/browse/COUCHDB>

You’ll need an account first if you don’t already have one:

https://issues.apache.org/jira/secure/Signup!default.jspa 
<https://issues.apache.org/jira/secure/Signup!default.jspa>

> In the web browser, I get this:
> 
> {"error":"unknown_error","reason":"function_clause","ref":1329706011}
> 
> When the error happened, also I was replicating from another CouchDB server 
> that has a large number of documents. I was running a test requesting the 
> views as the db was getting filled in to see at what point I started to get 
> the issue. So I started seeing the issue with about 17k documents (0.7GB).
> 
> I have just reloaded the page and it now works, but I have not been able to 
> make the view work on another machine with my complete DB which is much 
> bigger (1/2 million docs, 22GB)
> 
> This is what I see in the log in the machine with the large database:
> 
> [info] 2017-02-21T23:08:26.127000Z couchdb@localhost <0.4854.995> -------- 
> Starting index update for db: shards/00000000-1fffffff/vw.1481754819 idx: 
> _design/adag

<snip>

> [info] 2017-02-21T23:09:14.252000Z couchdb@localhost <0.212.0> -------- 
> couch_proc_manager <0.9345.4064> died normal
> [error] 2017-02-21T23:09:14.253000Z couchdb@localhost <0.11021.4075> -------- 
> OS Process Error <0.9345.4064> :: {os_process_error,"OS process timed out."}
> [error] 2017-02-21T23:09:14.376000Z couchdb@localhost emulator -------- Error 
> in process <0.11021.4075> on node 'couchdb@localhost' with exit value: 
> {{nocatch,{os_process_error,"OS process timed 
> out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"},{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_query_servers.erl"},{line,67}]},{couch_mrview_updater…

Now *this* is a separate issue. The “OS process timed out” error can be caused 
by a lot of things, but one of the most common is a large JSON document. I’ve 
seen documents around, say, 10 MB cause this timeout. Any chance you’ve got 
some of those hanging around? Again this is a configurable value which defaults 
to 5 seconds:

; Allow the system 20 seconds to process a document in a view
[couchdb]
os_process_timeout = 20000

At some point though this is a losing battle. Better to keep the documents 
under 1 MB if you have that flexibility.

> Any idea of what could be going on? I am running CouchDB 2.0.0.1 under 
> Windows 7 with a single node. I have not modified most of the default CouchDB 
> settings.

I haven’t kept pace with the current state of the art in the Windows build, so 
there may be other platform-specific issues at work with this “OS process timed 
out”. Cheers,

Adam

> 
> Regards,
> 
> Gustavo Delfino
>

Re: views failing due to fabric_worker_timeout and OS process timed out

Reply via email to