Hi! I would try a couple things:
(1) Increase connection_timeout to e.g. 180000 (3 minutes) (2) Do not use filtered replication on the server sending the data, just make a full replication, and if needed filter the data on the receiving side with a script. Br, Sinan On 6 October 2015 at 13:18, Mike <[email protected]> wrote: > What sort of connection is there between the two couchdbs ? > > I was having some problems with an adsl connection and replication when > the line was saturated - i had a play with the replicator settings - below > is what sorted my issues: > > [replicator] > db = _replicator > ; Maximum replicaton retry count can be a non-negative integer or > "infinity". > max_replication_retry_count = 10 > ; More worker processes can give higher network throughput but can also > ; imply more disk and network IO. > ;worker_processes = 4 > worker_processes = 1 > ; With lower batch sizes checkpoints are done more frequently. Lower batch > sizes > ; also reduce the total amount of used RAM memory. > ;worker_batch_size = 500 > worker_batch_size = 50 > ; Maximum number of HTTP connections per replication. > ;http_connections = 20 > http_connections = 2 > ; HTTP connection timeout per replication. > ; Even for very fast/reliable networks it might need to be increased if a > remote > ; database is too busy. > connection_timeout = 30000 > ; If a request fails, the replicator will retry it up to N times. > retries_per_request = 10 > > > > > On 06/10/2015 12:08, Francesco Zamboni wrote: > >> Ok, right now I'm getting more and more persuaded (not yet 100% sure, but >> at least 80% right now) that all my couchdb problems come to couchdb being >> unable to process records too big or too complex, essentially starting to >> cause a cascade of timeouts while all the db hangs indefinitely. >> >> Given that, and considering that it would be really really inconvenient >> having to somehow trim those records, how can I manage this? >> I've already tried playing with parameters like >> os_process_timeout and os_process_limit >> without noticing any change... There's some other parameters I could try, >> or some common pitfall I'm not considering? >> >> Thanks to everybody >> >> >> 2015-10-05 12:32 GMT+02:00 Francesco Zamboni <[email protected]>: >> >> Just to add some more informations, this is the crash report when I try to >>> start the replication: >>> >>> >>> [info] [<0.8712.8>] X.X.X.X - - POST /_replicate 500 >>> >>>> [error] [<0.8712.8>] httpd 500 error response: >>>> {"error":"timeout"} >>>> >>>> [error] [<0.21596.15>] ** Generic server <0.21596.15> terminating >>>> ** Last message in was {'EXIT',<0.21595.15>,killed} >>>> ** When Server state == {state,"http://www.xxx.xxx:4984/bozze/",20,[], >>>> [<0.21597.15>], >>>> {[],[]}} >>>> ** Reason for termination == >>>> ** killed >>>> >>>> >>>> =ERROR REPORT==== 5-Oct-2015::10:27:02 === >>>> ** Generic server <0.21596.15> terminating >>>> ** Last message in was {'EXIT',<0.21595.15>,killed} >>>> ** When Server state == {state,"http://www.xxx.xxx:4984/bozze/",20,[], >>>> [<0.21597.15>], >>>> {[],[]}} >>>> ** Reason for termination == >>>> ** killed >>>> [error] [<0.21596.15>] {error_report,<0.31.0>, >>>> {<0.21596.15>,crash_report, >>>> [[{initial_call, >>>> {couch_replicator_httpc_pool,init, >>>> ['Argument__1']}}, >>>> {pid,<0.21596.15>}, >>>> {registered_name,[]}, >>>> {error_info, >>>> {exit,killed, >>>> [{gen_server,terminate,7, >>>> [{file,"gen_server.erl"},{line,804}]}, >>>> {proc_lib,init_p_do_apply,3, >>>> [{file,"proc_lib.erl"},{line,237}]}]}}, >>>> {ancestors, >>>> [<0.21595.15>,couch_replicator_job_sup, >>>> couch_primary_services,couch_server_sup, >>>> <0.32.0>]}, >>>> {messages,[]}, >>>> {links,[<0.21597.15>]}, >>>> {dictionary,[]}, >>>> {trap_exit,true}, >>>> {status,running}, >>>> {heap_size,376}, >>>> {stack_size,27}, >>>> {reductions,178}], >>>> []]}} >>>> >>>> =CRASH REPORT==== 5-Oct-2015::10:27:02 === >>>> crasher: >>>> initial call: couch_replicator_httpc_pool:init/1 >>>> pid: <0.21596.15> >>>> registered_name: [] >>>> exception exit: killed >>>> in function gen_server:terminate/7 (gen_server.erl, line 804) >>>> ancestors: [<0.21595.15>,couch_replicator_job_sup, >>>> couch_primary_services,couch_server_sup,<0.32.0>] >>>> messages: [] >>>> links: [<0.21597.15>] >>>> dictionary: [] >>>> trap_exit: true >>>> status: running >>>> heap_size: 376 >>>> stack_size: 27 >>>> reductions: 178 >>>> neighbours: >>>> >>>> >>>> >>> >>> >>> >>> >>> 2015-10-02 19:16 GMT+02:00 Francesco Zamboni <[email protected]>: >>> >>> I did some tests... with "test" recordset I reproduced consistently the >>>> behaviour, creating huge single objects and then trying to index even a >>>> single view. >>>> >>>> The result is usually a long list of >>>> [error] [<0.25586.2>] OS Process Error <0.28141.2> :: {os_process_error, >>>> "OS process >>>> timed >>>> out."} >>>> Followed by a >>>> [info] [<0.17278.2>] 127.0.0.1 - - GET /test/_design/docs/_view/list_1 >>>> 500 >>>> [error] [emulator] Error in process <0.25586.2> with exit value: >>>> {{nocatch,{os_process_error,"OS process timed >>>> >>>> out."}},[{couch_os_process,prompt,2,[{file,"couch_os_process.erl"},{line,57}]},{couch_query_servers,map_doc_raw,2,[{file,"couch_query_servers.erl"},{line,88}]},{couch_mrview_updater... >>>> >>>> >>>> With the "real" db it does remains not deterministic: sometimes it does >>>> runs easily and quickly, sometimes it does lock completely, doing >>>> exactly >>>> the same operations over exactly the same data. >>>> >>>> Some of our records are in fact not small, but if attachments do not >>>> count, they're also not really so big, having to ... the biggest are >>>> around >>>> 100-200k. >>>> >>>> We had also some hang-ups while creating new filters, but those too >>>> seems >>>> not deterministic: you run it, couchdb freezes and never recovers, then >>>> you >>>> drop everything and re-create everything exactly the same and it does >>>> run >>>> smoothly... >>>> I'm trying to obtain some more informations from a "real db" crash, but >>>> the fact that it does happens so randomly and with an application that, >>>> being actively used, need to be restored ASAP, is frustrating my >>>> attempts, >>>> >>>> One thing we've excluded is the machine/installation: we've moved the >>>> database over different machines, with different network configurations, >>>> and the behaviour do re-appear. >>>> >>>> We're using couchdb 1.6.1 as a klaemo docker image over ubuntu 14.04 >>>> VMs, >>>> but we tried even a physical machine with a packaged installation from >>>> scratch. >>>> >>>> I'll write again if (hopefully when!) I'll find more... in the meantime >>>> thanks to everybody! >>>> >>>> 2015-09-29 22:36 GMT+02:00 Sebastian Rothbucher < >>>> [email protected]>: >>>> >>>> Hi Francesco, >>>>> >>>>> maybe these two things will help you: >>>>> 1.) as Harald Pointed out: filtered replication could be a problem. An >>>>> initial thought: make sure only one runs at a time. Surely not the >>>>> solution >>>>> in the long run, but could help figuring out where the problem is >>>>> 2.) Try intercepting the couchjs process to find out more. Maybe it's >>>>> always the same (typically huge) document where it hangs (see e.g. >>>>> here: >>>>> https://gist.github.com/sebastianrothbucher/01afe929095a55ab233e). >>>>> Generally, looking for huge documents (huge content, attachments don't >>>>> count here) might be worthwhile. When you exclude / delete these >>>>> temporarily, it might be another lead. Again: not the final soltuion, >>>>> but >>>>> helps pointing it down >>>>> >>>>> Good luck, pls. share what you found - and also let us all know when we >>>>> might be able to help >>>>> >>>>> Best >>>>> Sebastian >>>>> >>>>> On Tue, Sep 29, 2015 at 10:59 AM, Francesco Zamboni < >>>>> [email protected]> >>>>> wrote: >>>>> >>>>> Hello, >>>>>> we're having some problems with replication and couchdb, but as we're >>>>>> >>>>> still >>>>> >>>>>> quite green with couchdb I need to ask to people with more experience >>>>>> >>>>> even >>>>> >>>>>> what I can check, as the problem seems to be quite random and we've >>>>>> >>>>> been >>>>> >>>>>> not capable to even pinpoint a way to consistently reproduce the >>>>>> >>>>> problem. >>>>> >>>>>> Essentially, using couchdb 1.6.1, we've uploaded some thousands of >>>>>> documents occupying about 10 megabytes of space, more or less, so >>>>>> >>>>> nothing >>>>> >>>>>> especially big... >>>>>> Over these documents we've created a structure of views, lists, shows >>>>>> >>>>> and >>>>> >>>>>> other functions. >>>>>> The problems seems to start when we try to launch a series of one-shot >>>>>> filtered replication of these data over several sub-databases. >>>>>> After creating a variable number of replication documents, the system >>>>>> >>>>> seems >>>>> >>>>>> to completely hang. >>>>>> When the system is hanged, any attempt to access a view cause a crash. >>>>>> The only messages are of the "OS process timed out" kind, but we've >>>>>> >>>>> tried >>>>> >>>>>> to increase the os_process_timeout and os_process_limit parameters >>>>>> >>>>> without >>>>> >>>>>> any appreciable change. >>>>>> >>>>>> Obviously this is not enough information to ask where the problem is, >>>>>> >>>>> but >>>>> >>>>>> as we're new to couchdb, I'd like to ask for some pointers in what to >>>>>> check, some common pitfalls that could lead to this kind of problems >>>>>> >>>>> and so >>>>> >>>>>> on... we're having serious troubles understanding what happened when >>>>>> something go wrong... >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Francesco Zamboni >>>>>> >>>>>> tel: +39 0522 1590100 >>>>>> fax: +39 0522 331673 >>>>>> mob: +39 335 7548422 >>>>>> e-mail: [email protected] <[email protected]> >>>>>> web: www.mastertraining.it >>>>>> >>>>>> >>>>>> Sede Legale: via Timolini, 18 - Correggio (RE) - Italy >>>>>> Sede Operativa: via Sani, 15 - Reggio Emilia - Italy >>>>>> Sede Commerciale: via Sani, 9 - Reggio Emilia - Italy >>>>>> Le informazioni contenute in questa e-mail sono da considerarsi >>>>>> confidenziali e esclusivamente per uso personale dei destinatari sopra >>>>>> indicati. Questo messaggio può includere dati personali o sensibili. >>>>>> Qualora questo messaggio fosse da Voi ricevuto per errore vogliate >>>>>> cortesemente darcene notizia a mezzo e-mail e distruggere il messaggio >>>>>> ricevuto erroneamente. Quanto precede ai fini del rispetto del Decreto >>>>>> Legislativo 196/2003 sulla tutela dei dati personali e sensibili. >>>>>> This e-mail and any file transmitted with it is intended only for the >>>>>> person or entity to which is addressed and may contain information >>>>>> that is privileged, confidential or otherwise protected from >>>>>> disclosure.Copying, dissemination or use of this e-mail or the >>>>>> information herein by anyone other than the intended recipient is >>>>>> prohibited. If you have received this e-mail by mistake, please notify >>>>>> us immediately by telephone or fax. >>>>>> >>>>>> >>>> >>>> -- >>>> Francesco Zamboni >>>> >>>> tel: +39 0522 1590100 >>>> fax: +39 0522 331673 >>>> mob: +39 335 7548422 >>>> e-mail: [email protected] <[email protected]> >>>> >>>> web: www.mastertraining.it >>>> >>>> >>>> Sede Legale: via Timolini, 18 - Correggio (RE) - Italy >>>> Sede Operativa: via Sani, 15 - Reggio Emilia - Italy >>>> Sede Commerciale: via Sani, 9 - Reggio Emilia - Italy >>>> Le informazioni contenute in questa e-mail sono da considerarsi >>>> confidenziali e esclusivamente per uso personale dei destinatari sopra >>>> indicati. Questo messaggio può includere dati personali o sensibili. >>>> Qualora questo messaggio fosse da Voi ricevuto per errore vogliate >>>> cortesemente darcene notizia a mezzo e-mail e distruggere il messaggio >>>> ricevuto erroneamente. Quanto precede ai fini del rispetto del Decreto >>>> Legislativo 196/2003 sulla tutela dei dati personali e sensibili. >>>> This e-mail and any file transmitted with it is intended only for the >>>> person or entity to which is addressed and may contain information that is >>>> privileged, confidential or otherwise protected from disclosure.Copying, >>>> dissemination or use of this e-mail or the information herein by anyone >>>> other than the intended recipient is prohibited. If you have received this >>>> e-mail by mistake, please notify us immediately by telephone or fax. >>>> >>>> >>>> >>> -- >>> Francesco Zamboni >>> >>> tel: +39 0522 1590100 >>> fax: +39 0522 331673 >>> mob: +39 335 7548422 >>> e-mail: [email protected] <[email protected]> >>> web: www.mastertraining.it >>> >>> >>> Sede Legale: via Timolini, 18 - Correggio (RE) - Italy >>> Sede Operativa: via Sani, 15 - Reggio Emilia - Italy >>> Sede Commerciale: via Sani, 9 - Reggio Emilia - Italy >>> Le informazioni contenute in questa e-mail sono da considerarsi >>> confidenziali e esclusivamente per uso personale dei destinatari sopra >>> indicati. Questo messaggio può includere dati personali o sensibili. >>> Qualora questo messaggio fosse da Voi ricevuto per errore vogliate >>> cortesemente darcene notizia a mezzo e-mail e distruggere il messaggio >>> ricevuto erroneamente. Quanto precede ai fini del rispetto del Decreto >>> Legislativo 196/2003 sulla tutela dei dati personali e sensibili. >>> This e-mail and any file transmitted with it is intended only for the >>> person or entity to which is addressed and may contain information that is >>> privileged, confidential or otherwise protected from disclosure.Copying, >>> dissemination or use of this e-mail or the information herein by anyone >>> other than the intended recipient is prohibited. If you have received this >>> e-mail by mistake, please notify us immediately by telephone or fax. >>> >>> >>> >> >
