Great, glad we got that sorted out. On the topic of max_dbs_open - it can be a funny optimization dance, because the data structures that manage the list of open databases will become less efficient as the number of open databases increases. In particular, it becomes expensive to find the least recently used (LRU) DB in order to close it.
In my experience, increasing max_dbs_open makes a lot of sense if e.g. you have 1000 active databases. By the time you reach 100k databases it's often better to match the max_dbs_open to the number of DBs that are queried over some short timespan, so that it's cheap to find the right one to close. Best, Adam On May 1, 2014, at 2:47 PM, Herman Chan <herman...@gmail.com> wrote: > Thanks Adam, > > It make sense now why we crashed, we've set a very high number on > max_dbs_open (something like 100000) and with the formula you described, > it'll create something like 800,000 processes (we have around 800,000 db on > this box), which is higher than what we set on ERL_FLAGS. > > Thanks for your help! > > Herman > > On 2014-05-01, at 2:31 PM, Adam Kocoloski <kocol...@apache.org> wrote: > >> Sure, here are a few rules of thumb: >> >> * 1 process per inbound TCP connection >> * 4 processes per open DB (up to [couchdb] max_dbs_open DBs will be kept >> open simultaneously) >> * 3 processes per open view group (I might be off by one or two here) >> >> More Erlang processes require more RAM, so don't go crazy. >> >> Adam >> >> On May 1, 2014, at 12:24 PM, Herman Chan <herman...@gmail.com> wrote: >> >>> Thanks Adam, >>> >>> We just tried that and it seems to hold up. Just wondering if there is >>> some kind of formula on what to set ERL_FLAGS to? >>> >>> Herman >>> On 2014-05-01, at 10:51 AM, Adam Kocoloski <kocol...@apache.org> wrote: >>> >>>> Hi Herman, I think those are just the view groups shutting down after the >>>> parent DB crashed because you ran out of processes. >>>> >>>> You can increase the maximum number of processes via the ERL_FLAGS >>>> environment variable, e.g. >>>> >>>>> $ ERL_FLAGS="+P 512000" erl >>>>> Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] >>>>> [async-threads:0] [hipe] [kernel-poll:false] >>>>> >>>>> Eshell V5.8.2 (abort with ^G) >>>>> 1> erlang:system_info(process_limit). >>>>> 512000 >>>> >>>> The default is 256k, assuming you've got enough RAM you can bump that up >>>> to 1M with impunity. Regards, >>>> >>>> Adam >>>> >>>> On May 1, 2014, at 10:43 AM, Herman Chan <herman...@gmail.com> wrote: >>>> >>>>> We do have 1000+ connection to the db, which we are trying to dial down. >>>>> However, even with lower connection, we hit the crash again, this time I >>>>> was able to get a better log. You are right that we are hitting some >>>>> limit, >>>>> >>>>> before the crash, the log shows that couch is still trying to open up >>>>> index from a reboot that we did. Once it crash, the log start print out >>>>> with "Index shutdown by monitor". Is there any limit parameter that we >>>>> can increase? >>>>> >>>>> [Thu, 01 May 2014 14:28:04 GMT] [error] [emulator] Too many processes >>>>> [Thu, 01 May 2014 14:28:04 GMT] [error] [emulator] Error in process >>>>> <0.3672.477> with exit value: >>>>> {system_limit,[{erlang,spawn_opt,[proc_lib,init_p,[<0.3672.477>,[],gen,init_it,[ >>>>> gen_server,<0.3672.477>,<0.3672.477>,couch_db,{<<42 >>>>> bytes>>,"/usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch",<0.21556.480>,[{user_ctx,{user_ctx,null, >>>>> [<<6 bytes>>],undefined... >>>>> >>>>> >>>>> [Thu, 01 May 2014 14:28:04 GMT] [error] [<0.21556.480>] ** Generic server >>>>> <0.21556.480> terminating >>>>> ** Last message in was {'EXIT',<0.3672.477>, >>>>> {system_limit, >>>>> [{erlang,spawn_opt, >>>>> [proc_lib,init_p, >>>>> [<0.3672.477>,[],gen,init_it, >>>>> [gen_server,<0.3672.477>,<0.3672.477>,couch_db, >>>>> >>>>> {<<"group_370c0635-e593-45ed-ac96-75e6b318cb35">>, >>>>> >>>>> "/usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch", >>>>> <0.21556.480>, >>>>> [{user_ctx, >>>>> {user_ctx,null,[<<"_admin">>],undefined}}]}, >>>>> []]], >>>>> [link]]}, >>>>> {proc_lib,start_link,5}, >>>>> {couch_db,start_link,3}, >>>>> {couch_server,'-open_async/5-fun-0-',4}]}} >>>>> ** When Server state == {file, >>>>> {file_descriptor,prim_file, >>>>> {#Port<0.898531>,307709}}, >>>>> 1261681} >>>>> ** Reason for termination == >>>>> ** {system_limit, >>>>> [{erlang,spawn_opt, >>>>> [proc_lib,init_p, >>>>> [<0.3672.477>,[],gen,init_it, >>>>> [gen_server,<0.3672.477>,<0.3672.477>,couch_db, >>>>> {<<"group_370c0635-e593-45ed-ac96-75e6b318cb35">>, >>>>> >>>>> "/usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch", >>>>> <0.21556.480>, >>>>> [{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}]}, >>>>> []]], >>>>> [link]]}, >>>>> {proc_lib,start_link,5}, >>>>> {couch_db,start_link,3}, >>>>> {couch_server,'-open_async/5-fun-0-',4}]} >>>>> >>>>> [Thu, 01 May 2014 14:28:04 GMT] [error] [<0.21556.480>] >>>>> {error_report,<0.31.0>, >>>>> {<0.21556.480>,crash_report, >>>>> [[{initial_call,{couch_file,init,['Argument__1']}}, >>>>> {pid,<0.21556.480>}, >>>>> {registered_name,[]}, >>>>> {error_info, >>>>> {exit, >>>>> {system_limit, >>>>> [{erlang,spawn_opt, >>>>> [proc_lib,init_p, >>>>> [<0.3672.477>,[],gen,init_it, >>>>> [gen_server,<0.3672.477>,<0.3672.477>, >>>>> couch_db, >>>>> >>>>> {<<"group_370c0635-e593-45ed-ac96-75e6b318cb35">>, >>>>> >>>>> "/usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch", >>>>> <0.21556.480>, >>>>> [{user_ctx, >>>>> {user_ctx,null, >>>>> [<<"_admin">>], >>>>> undefined}}]}, >>>>> []]], >>>>> [link]]}, >>>>> {proc_lib,start_link,5}, >>>>> {couch_db,start_link,3}, >>>>> {couch_server,'-open_async/5-fun-0-',4}]}, >>>>> [{gen_server,terminate,6}, >>>>> {proc_lib,init_p_do_apply,3}]}}, >>>>> {ancestors,[<0.3672.477>]}, >>>>> {messages,[]}, >>>>> {links,[]}, >>>>> {dictionary,[]}, >>>>> {trap_exit,true}, >>>>> {status,running}, >>>>> {heap_size,610}, >>>>> {stack_size,24}, >>>>> {reductions,973}], >>>>> []]}} >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.20971.87>] Index shutdown by >>>>> monitor notice for db: group_5747d16f-4b3b-4522-af10-1dc7d0d644aa idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.4883.35>] Index shutdown by >>>>> monitor notice for db: group_15ccf331-257d-4b54-b457-997d342816b9 idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.4892.35>] Index shutdown by >>>>> monitor notice for db: group_15ccf331-257d-4b54-b457-997d342816b9 idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.12040.33>] Index shutdown by >>>>> monitor notice for db: group_d006a71d-b0de-4d71-b2f7-06abeeb34e00 idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.20971.87>] Closing index for >>>>> db: group_5747d16f-4b3b-4522-af10-1dc7d0d644aa idx: _design/filters sig: >>>>> "3e823c2a4383ac0c18d4e574135a5b08" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.12032.33>] Index shutdown by >>>>> monitor notice for db: group_d006a71d-b0de-4d71-b2f7-06abeeb34e00 idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.4892.35>] Closing index for >>>>> db: group_15ccf331-257d-4b54-b457-997d342816b9 idx: _design/filters sig: >>>>> "3e823c2a4383ac0c18d4e574135a5b08" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.4292.4>] Index shutdown by >>>>> monitor notice for db: group_ae50933f-de22-4879-9624-b760106060b3 idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.4285.4>] Index shutdown by >>>>> monitor notice for db: group_ae50933f-de22-4879-9624-b760106060b3 idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.20956.87>] Index shutdown by >>>>> monitor notice for db: group_5747d16f-4b3b-4522-af10-1dc7d0d644aa idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.4883.35>] Closing index for >>>>> db: group_15ccf331-257d-4b54-b457-997d342816b9 idx: _design/hub sig: >>>>> "4f6edcabc4b7a6357b714e1391ed93ac" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.12040.33>] Closing index for >>>>> db: group_d006a71d-b0de-4d71-b2f7-06abeeb34e00 idx: _design/filters sig: >>>>> "3e823c2a4383ac0c18d4e574135a5b08" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.18850.44>] Index shutdown by >>>>> monitor notice for db: group_721d99a3-2257-48d0-8a1e-89294874d06e idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.18842.44>] Index shutdown by >>>>> monitor notice for db: group_721d99a3-2257-48d0-8a1e-89294874d06e idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.12032.33>] Closing index for >>>>> db: group_d006a71d-b0de-4d71-b2f7-06abeeb34e00 idx: _design/hub sig: >>>>> "4f6edcabc4b7a6357b714e1391ed93ac" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.27768.43>] Index shutdown by >>>>> monitor notice for db: group_56a0df90-c79e-4863-ae71-2bde3cb0d801 idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.27775.43>] Index shutdown by >>>>> monitor notice for db: group_56a0df90-c79e-4863-ae71-2bde3cb0d801 idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.4292.4>] Closing index for db: >>>>> group_ae50933f-de22-4879-9624-b760106060b3 idx: _design/filters sig: >>>>> "3e823c2a4383ac0c18d4e574135a5b08" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.4285.4>] Closing index for db: >>>>> group_ae50933f-de22-4879-9624-b760106060b3 idx: _design/hub sig: >>>>> "4f6edcabc4b7a6357b714e1391ed93ac" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.6010.43>] Index shutdown by >>>>> monitor notice for db: group_7f082ae6-f41d-4a14-a836-2360303b2e9a idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.6003.43>] Index shutdown by >>>>> monitor notice for db: group_7f082ae6-f41d-4a14-a836-2360303b2e9a idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.20956.87>] Closing index for >>>>> db: group_5747d16f-4b3b-4522-af10-1dc7d0d644aa idx: _design/hub sig: >>>>> "4f6edcabc4b7a6357b714e1391ed93ac" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.5933.42>] Index shutdown by >>>>> monitor notice for db: group_8c49d7e8-b61e-41e5-a220-11df59b9cce4 idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.5940.42>] Index shutdown by >>>>> monitor notice for db: group_8c49d7e8-b61e-41e5-a220-11df59b9cce4 idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.18842.44>] Closing index for >>>>> db: group_721d99a3-2257-48d0-8a1e-89294874d06e idx: _design/hub sig: >>>>> "4f6edcabc4b7a6357b714e1391ed93ac" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.17529.33>] Index shutdown by >>>>> monitor notice for db: group_98ff493c-63e8-4714-9940-ccea514d4b1d idx: >>>>> _design/hub >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.18850.44>] Closing index for >>>>> db: group_721d99a3-2257-48d0-8a1e-89294874d06e idx: _design/filters sig: >>>>> "3e823c2a4383ac0c18d4e574135a5b08" >>>>> reason: normal >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.17536.33>] Index shutdown by >>>>> monitor notice for db: group_98ff493c-63e8-4714-9940-ccea514d4b1d idx: >>>>> _design/filters >>>>> [Thu, 01 May 2014 14:28:04 GMT] [info] [<0.27768.43>] Closing index for >>>>> db: group_56a0df90-c79e-4863-ae71-2bde3cb0d801 idx: _design/hub sig: >>>>> "4f6edcabc4b7a6357b714e1391ed93ac" >>>>> >>>>> On 2014-05-01, at 9:18 AM, Adam Kocoloski <kocol...@apache.org> wrote: >>>>> >>>>>> On May 1, 2014, at 8:47 AM, Interactive Blueprints >>>>>> <p.van.der.e...@interactiveblueprints.nl> wrote: >>>>>> >>>>>>> 2014-05-01 13:14 GMT+02:00 Herman Chan <herman...@gmail.com>: >>>>>>>> Thanks Adam, >>>>>>>> >>>>>>>> It seems like it is happening again, with more info this time. It >>>>>>>> looks like I am hitting some sort of system limit, can anyone point >>>>>>>> out where to look next? >>>>>>> >>>>>>> Just guessing here.. >>>>>>> What could be is that you hit the max open file limit of your system. >>>>>>> With "ulimit -a" you can see the limits on your system. >>>>>>> Usually the max open file limit is somewhere around 1024. >>>>>>> I noticed that couchdb loves to have a lot of files open simultaneously. >>>>>>> >>>>>>> Iin the same shell you start couchdb, right before you start couchdb, >>>>>>> you can do a "ulimit -a 4096" (or another large value), this should >>>>>>> give coudhb the ability to open more files. >>>>>>> >>>>>>> Hope this helps. >>>>>>> >>>>>>> Pieter van der Eems >>>>>>> Interactive Blueprints >>>>>> >>>>>> That's a good thought Pieter, though typically in that case you'll see >>>>>> an 'emfile' error in the logs. This particular system_limit error (with >>>>>> {erlang, spawn_link, ...} following it) occurs when the Erlang VM has >>>>>> reached the maximum number of processes it's allowed to spawn. Judging >>>>>> from the *long* list of processes linked to couch_httpd in this >>>>>> stacktrace I'd say Herman's client is improperly leaving connections >>>>>> open. Herman, did you intend to have 1000s of open TCP connections on >>>>>> this server? Regards, >>>>>> >>>>>> Adam >>>>> >>>> >>> >> >