PS (3) If documents are small I would also set worker_batch_size = 5000 or 10000 either in configuration or better just with the individual replication call
On 6 October 2015 at 14:37, Sinan Gabel <[email protected]> wrote: > Hi! > > I would try a couple things: > > (1) Increase connection_timeout to e.g. 180000 (3 minutes) > > (2) Do not use filtered replication on the server sending the data, just > make a full replication, and if needed filter the data on the receiving > side with a script. > > Br, > Sinan > > On 6 October 2015 at 13:18, Mike <[email protected]> wrote: > >> What sort of connection is there between the two couchdbs ? >> >> I was having some problems with an adsl connection and replication when >> the line was saturated - i had a play with the replicator settings - below >> is what sorted my issues: >> >> [replicator] >> db = _replicator >> ; Maximum replicaton retry count can be a non-negative integer or >> "infinity". >> max_replication_retry_count = 10 >> ; More worker processes can give higher network throughput but can also >> ; imply more disk and network IO. >> ;worker_processes = 4 >> worker_processes = 1 >> ; With lower batch sizes checkpoints are done more frequently. Lower >> batch sizes >> ; also reduce the total amount of used RAM memory. >> ;worker_batch_size = 500 >> worker_batch_size = 50 >> ; Maximum number of HTTP connections per replication. >> ;http_connections = 20 >> http_connections = 2 >> ; HTTP connection timeout per replication. >> ; Even for very fast/reliable networks it might need to be increased if a >> remote >> ; database is too busy. >> connection_timeout = 30000 >> ; If a request fails, the replicator will retry it up to N times. >> retries_per_request = 10 >> >> >> >> >> On 06/10/2015 12:08, Francesco Zamboni wrote: >> >>> Ok, right now I'm getting more and more persuaded (not yet 100% sure, but >>> at least 80% right now) that all my couchdb problems come to couchdb >>> being >>> unable to process records too big or too complex, essentially starting to >>> cause a cascade of timeouts while all the db hangs indefinitely. >>> >>> Given that, and considering that it would be really really inconvenient >>> having to somehow trim those records, how can I manage this? >>> I've already tried playing with parameters like >>> os_process_timeout and os_process_limit >>> without noticing any change... There's some other parameters I could try, >>> or some common pitfall I'm not considering? >>> >>> Thanks to everybody >>> >>> >>> 2015-10-05 12:32 GMT+02:00 Francesco Zamboni <[email protected]>: >>> >>> Just to add some more informations, this is the crash report when I try >>>> to >>>> start the replication: >>>> >>>> >>>> [info] [<0.8712.8>] X.X.X.X - - POST /_replicate 500 >>>> >>>>> [error] [<0.8712.8>] httpd 500 error response: >>>>> {"error":"timeout"} >>>>> >>>>> [error] [<0.21596.15>] ** Generic server <0.21596.15> terminating >>>>> ** Last message in was {'EXIT',<0.21595.15>,killed} >>>>> ** When Server state == {state,"http://www.xxx.xxx:4984/bozze/",20,[], >>>>> [<0.21597.15>], >>>>> {[],[]}} >>>>> ** Reason for termination == >>>>> ** killed >>>>> >>>>> >>>>> =ERROR REPORT==== 5-Oct-2015::10:27:02 === >>>>> ** Generic server <0.21596.15> terminating >>>>> ** Last message in was {'EXIT',<0.21595.15>,killed} >>>>> ** When Server state == {state,"http://www.xxx.xxx:4984/bozze/",20,[], >>>>> [<0.21597.15>], >>>>> {[],[]}} >>>>> ** Reason for termination == >>>>> ** killed >>>>> [error] [<0.21596.15>] {error_report,<0.31.0>, >>>>> {<0.21596.15>,crash_report, >>>>> [[{initial_call, >>>>> {couch_replicator_httpc_pool,init, >>>>> ['Argument__1']}}, >>>>> {pid,<0.21596.15>}, >>>>> {registered_name,[]}, >>>>> {error_info, >>>>> {exit,killed, >>>>> [{gen_server,terminate,7, >>>>> [{file,"gen_server.erl"},{line,804}]}, >>>>> {proc_lib,init_p_do_apply,3, >>>>> [{file,"proc_lib.erl"},{line,237}]}]}}, >>>>> {ancestors, >>>>> [<0.21595.15>,couch_replicator_job_sup, >>>>> couch_primary_services,couch_server_sup, >>>>> <0.32.0>]}, >>>>> {messages,[]}, >>>>> {links,[<0.21597.15>]}, >>>>> {dictionary,[]}, >>>>> {trap_exit,true}, >>>>> {status,running}, >>>>> {heap_size,376}, >>>>> {stack_size,27}, >>>>> {reductions,178}], >>>>> []]}} >>>>> >>>>> =CRASH REPORT==== 5-Oct-2015::10:27:02 === >>>>> crasher: >>>>> initial call: couch_replicator_httpc_pool:init/1 >>>>> pid: <0.21596.15> >>>>> registered_name: [] >>>>> exception exit: killed >>>>> in function gen_server:terminate/7 (gen_server.erl, line 804) >>>>> ancestors: [<0.21595.15>,couch_replicator_job_sup, >>>>> couch_primary_services,couch_server_sup,<0.32.0>] >>>>> messages: [] >>>>> links: [<0.21597.15>] >>>>> dictionary: [] >>>>> trap_exit: true >>>>> status: running >>>>> heap_size: 376 >>>>> stack_size: 27 >>>>> reductions: 178 >>>>> neighbours: >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> 2015-10-02 19:16 GMT+02:00 Francesco Zamboni <[email protected]>: >>>> >>>> I did some tests... with "test" recordset I reproduced consistently the >>>>> behaviour, creating huge single objects and then trying to index even a >>>>> single view. >>>>> >>>>> The result is usually a long list of >>>>> [error] [<0.25586.2>] OS Process Error <0.28141.2> :: >>>>> {os_process_error, >>>>> "OS process >>>>> timed >>>>> out."} >>>>> Followed by a >>>>> [info] [<0.17278.2>] 127.0.0.1 - - GET /test/_design/docs/_view/list_1 >>>>> 500 >>>>> [error] [emulator] Error in process <0.25586.2> with exit value: >>>>> {{nocatch,{os_process_error,"OS process timed >>>>> >>>>> out."}},[{couch_os_process,prompt,2,[{file,"couch_os_process.erl"},{line,57}]},{couch_query_servers,map_doc_raw,2,[{file,"couch_query_servers.erl"},{line,88}]},{couch_mrview_updater... >>>>> >>>>> >>>>> With the "real" db it does remains not deterministic: sometimes it does >>>>> runs easily and quickly, sometimes it does lock completely, doing >>>>> exactly >>>>> the same operations over exactly the same data. >>>>> >>>>> Some of our records are in fact not small, but if attachments do not >>>>> count, they're also not really so big, having to ... the biggest are >>>>> around >>>>> 100-200k. >>>>> >>>>> We had also some hang-ups while creating new filters, but those too >>>>> seems >>>>> not deterministic: you run it, couchdb freezes and never recovers, >>>>> then you >>>>> drop everything and re-create everything exactly the same and it does >>>>> run >>>>> smoothly... >>>>> I'm trying to obtain some more informations from a "real db" crash, but >>>>> the fact that it does happens so randomly and with an application that, >>>>> being actively used, need to be restored ASAP, is frustrating my >>>>> attempts, >>>>> >>>>> One thing we've excluded is the machine/installation: we've moved the >>>>> database over different machines, with different network >>>>> configurations, >>>>> and the behaviour do re-appear. >>>>> >>>>> We're using couchdb 1.6.1 as a klaemo docker image over ubuntu 14.04 >>>>> VMs, >>>>> but we tried even a physical machine with a packaged installation from >>>>> scratch. >>>>> >>>>> I'll write again if (hopefully when!) I'll find more... in the meantime >>>>> thanks to everybody! >>>>> >>>>> 2015-09-29 22:36 GMT+02:00 Sebastian Rothbucher < >>>>> [email protected]>: >>>>> >>>>> Hi Francesco, >>>>>> >>>>>> maybe these two things will help you: >>>>>> 1.) as Harald Pointed out: filtered replication could be a problem. An >>>>>> initial thought: make sure only one runs at a time. Surely not the >>>>>> solution >>>>>> in the long run, but could help figuring out where the problem is >>>>>> 2.) Try intercepting the couchjs process to find out more. Maybe it's >>>>>> always the same (typically huge) document where it hangs (see e.g. >>>>>> here: >>>>>> https://gist.github.com/sebastianrothbucher/01afe929095a55ab233e). >>>>>> Generally, looking for huge documents (huge content, attachments don't >>>>>> count here) might be worthwhile. When you exclude / delete these >>>>>> temporarily, it might be another lead. Again: not the final soltuion, >>>>>> but >>>>>> helps pointing it down >>>>>> >>>>>> Good luck, pls. share what you found - and also let us all know when >>>>>> we >>>>>> might be able to help >>>>>> >>>>>> Best >>>>>> Sebastian >>>>>> >>>>>> On Tue, Sep 29, 2015 at 10:59 AM, Francesco Zamboni < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>> Hello, >>>>>>> we're having some problems with replication and couchdb, but as we're >>>>>>> >>>>>> still >>>>>> >>>>>>> quite green with couchdb I need to ask to people with more experience >>>>>>> >>>>>> even >>>>>> >>>>>>> what I can check, as the problem seems to be quite random and we've >>>>>>> >>>>>> been >>>>>> >>>>>>> not capable to even pinpoint a way to consistently reproduce the >>>>>>> >>>>>> problem. >>>>>> >>>>>>> Essentially, using couchdb 1.6.1, we've uploaded some thousands of >>>>>>> documents occupying about 10 megabytes of space, more or less, so >>>>>>> >>>>>> nothing >>>>>> >>>>>>> especially big... >>>>>>> Over these documents we've created a structure of views, lists, shows >>>>>>> >>>>>> and >>>>>> >>>>>>> other functions. >>>>>>> The problems seems to start when we try to launch a series of >>>>>>> one-shot >>>>>>> filtered replication of these data over several sub-databases. >>>>>>> After creating a variable number of replication documents, the system >>>>>>> >>>>>> seems >>>>>> >>>>>>> to completely hang. >>>>>>> When the system is hanged, any attempt to access a view cause a >>>>>>> crash. >>>>>>> The only messages are of the "OS process timed out" kind, but we've >>>>>>> >>>>>> tried >>>>>> >>>>>>> to increase the os_process_timeout and os_process_limit parameters >>>>>>> >>>>>> without >>>>>> >>>>>>> any appreciable change. >>>>>>> >>>>>>> Obviously this is not enough information to ask where the problem is, >>>>>>> >>>>>> but >>>>>> >>>>>>> as we're new to couchdb, I'd like to ask for some pointers in what to >>>>>>> check, some common pitfalls that could lead to this kind of problems >>>>>>> >>>>>> and so >>>>>> >>>>>>> on... we're having serious troubles understanding what happened when >>>>>>> something go wrong... >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Francesco Zamboni >>>>>>> >>>>>>> tel: +39 0522 1590100 >>>>>>> fax: +39 0522 331673 >>>>>>> mob: +39 335 7548422 >>>>>>> e-mail: [email protected] <[email protected]> >>>>>>> web: www.mastertraining.it >>>>>>> >>>>>>> >>>>>>> Sede Legale: via Timolini, 18 - Correggio (RE) - Italy >>>>>>> Sede Operativa: via Sani, 15 - Reggio Emilia - Italy >>>>>>> Sede Commerciale: via Sani, 9 - Reggio Emilia - Italy >>>>>>> Le informazioni contenute in questa e-mail sono da considerarsi >>>>>>> confidenziali e esclusivamente per uso personale dei destinatari >>>>>>> sopra >>>>>>> indicati. Questo messaggio può includere dati personali o sensibili. >>>>>>> Qualora questo messaggio fosse da Voi ricevuto per errore vogliate >>>>>>> cortesemente darcene notizia a mezzo e-mail e distruggere il >>>>>>> messaggio >>>>>>> ricevuto erroneamente. Quanto precede ai fini del rispetto del >>>>>>> Decreto >>>>>>> Legislativo 196/2003 sulla tutela dei dati personali e sensibili. >>>>>>> This e-mail and any file transmitted with it is intended only for the >>>>>>> person or entity to which is addressed and may contain information >>>>>>> that is privileged, confidential or otherwise protected from >>>>>>> disclosure.Copying, dissemination or use of this e-mail or the >>>>>>> information herein by anyone other than the intended recipient is >>>>>>> prohibited. If you have received this e-mail by mistake, please >>>>>>> notify >>>>>>> us immediately by telephone or fax. >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Francesco Zamboni >>>>> >>>>> tel: +39 0522 1590100 >>>>> fax: +39 0522 331673 >>>>> mob: +39 335 7548422 >>>>> e-mail: [email protected] <[email protected]> >>>>> >>>>> web: www.mastertraining.it >>>>> >>>>> >>>>> Sede Legale: via Timolini, 18 - Correggio (RE) - Italy >>>>> Sede Operativa: via Sani, 15 - Reggio Emilia - Italy >>>>> Sede Commerciale: via Sani, 9 - Reggio Emilia - Italy >>>>> Le informazioni contenute in questa e-mail sono da considerarsi >>>>> confidenziali e esclusivamente per uso personale dei destinatari sopra >>>>> indicati. Questo messaggio può includere dati personali o sensibili. >>>>> Qualora questo messaggio fosse da Voi ricevuto per errore vogliate >>>>> cortesemente darcene notizia a mezzo e-mail e distruggere il messaggio >>>>> ricevuto erroneamente. Quanto precede ai fini del rispetto del Decreto >>>>> Legislativo 196/2003 sulla tutela dei dati personali e sensibili. >>>>> This e-mail and any file transmitted with it is intended only for the >>>>> person or entity to which is addressed and may contain information that is >>>>> privileged, confidential or otherwise protected from disclosure.Copying, >>>>> dissemination or use of this e-mail or the information herein by anyone >>>>> other than the intended recipient is prohibited. If you have received this >>>>> e-mail by mistake, please notify us immediately by telephone or fax. >>>>> >>>>> >>>>> >>>> -- >>>> Francesco Zamboni >>>> >>>> tel: +39 0522 1590100 >>>> fax: +39 0522 331673 >>>> mob: +39 335 7548422 >>>> e-mail: [email protected] <[email protected]> >>>> web: www.mastertraining.it >>>> >>>> >>>> Sede Legale: via Timolini, 18 - Correggio (RE) - Italy >>>> Sede Operativa: via Sani, 15 - Reggio Emilia - Italy >>>> Sede Commerciale: via Sani, 9 - Reggio Emilia - Italy >>>> Le informazioni contenute in questa e-mail sono da considerarsi >>>> confidenziali e esclusivamente per uso personale dei destinatari sopra >>>> indicati. Questo messaggio può includere dati personali o sensibili. >>>> Qualora questo messaggio fosse da Voi ricevuto per errore vogliate >>>> cortesemente darcene notizia a mezzo e-mail e distruggere il messaggio >>>> ricevuto erroneamente. Quanto precede ai fini del rispetto del Decreto >>>> Legislativo 196/2003 sulla tutela dei dati personali e sensibili. >>>> This e-mail and any file transmitted with it is intended only for the >>>> person or entity to which is addressed and may contain information that is >>>> privileged, confidential or otherwise protected from disclosure.Copying, >>>> dissemination or use of this e-mail or the information herein by anyone >>>> other than the intended recipient is prohibited. If you have received this >>>> e-mail by mistake, please notify us immediately by telephone or fax. >>>> >>>> >>>> >>> >> >
