Re: Replication is Failing is this a known problem?

Jeff Hinrichs - DM&T Fri, 27 Feb 2009 17:03:13 -0800

On Fri, Feb 27, 2009 at 8:57 AM, Adam Kocoloski
<[email protected]> wrote:
> Hi Jeff, I can pick this one up, but not before Monday. We do have some
> replicating-attachment JIRA tickets open and active, but it looks like
> there's some new stuff in this report too.  Feel free to file another one.
>  Best,
>
> Adam
I'll review the current JIRA tickets to avoid a dupe if found, I'll
also work on building a reproducible test case for you.  Hope that
python script is ok with you.


Regards,

Jeff
>
> Sent from my iPhone
>
> On Feb 27, 2009, at 9:13 AM, "Jeff Hinrichs - DM&T" <[email protected]>
> wrote:
>
>> Attempting to replicate a database with largish attachments (<= ~18MB
>> of attachments in a doc, less thatn 200 docs)  from one machine to
>> another fails consistently and at the same point.
>>
>> Scenario:
>> Both servers are running from HEAD and I've been tracking for some
>> time.  This problem has been around as long as I've been using couch.
>>
>> Machine A holds the original database, Machine B is the server that is
>> doing a PULL replication
>>
>> During the replication, Machine A starts showing the following
>> sporadically in the log:
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5902.3>] 'GET'
>>
>> /delasco-invoices/INV00652429?revs=true&attachments=true&latest=true&open_revs=["425644723"]
>> {1,
>>
>>                            1}
>> Headers: [{'Host',"192.168.2.52:5984"}]
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [error] [<0.5901.3>] Uncaught error in
>> HTTP request: {exit,normal}
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] Stacktrace:
>> [{mochiweb_request,send,2},
>>            {couch_httpd,send_chunk,2},
>>            {couch_httpd_db,db_doc_req,3},
>>            {couch_httpd_db,do_db_req,2},
>>            {couch_httpd,handle_request,3},
>>            {mochiweb_http,headers,5},
>>            {proc_lib,init_p,5}]
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] HTTPd 500 error
>> response:
>> {"error":"error","reason":"normal"}
>>
>> As the replication continues, the frequency of these error "Uncaught
>> error in HTTP request: {exit,normal}"  increase.  Until the error is
>> being constantly repeated.  Then Machine B stops sending requests, no
>> mor log output, no errors, the last thing in Machine B's log file is:
>> [Fri, 27 Feb 2009 14:03:24 GMT] [info] [<0.20893.1>] retrying
>> couch_rep HTTP get request due to {error, req_timedout}: [104,116,
>>
>>  116,112,58,
>>                                                                  47,47,49,
>>                                                                  57,50,46,
>>                                                                  49,54,56,
>>                                                                  46,50,46,
>>                                                                  53,50,58,
>>                                                                  53,57,56,
>>
>>  52,47,100,
>>
>>  101,108,97,
>>
>>  115,99,111,
>>
>>  45,105,110,
>>                                                                  118,111,
>>
>>  105,99,101,
>>
>>  115,47,73,
>>                                                                  78,86,48,
>>                                                                  48,54,53,
>>                                                                  50,49,51,
>>
>>  56,63,114,
>>                                                                  101,118,
>>
>>  115,61,116,
>>                                                                  114,117,
>>
>>  101,38,97,
>>
>>  116,116,97,
>>
>>  99,104,109,
>>                                                                  101,110,
>>
>>  116,115,61,
>>                                                                  116,114,
>>
>>  117,101,38,
>>
>>  108,97,116,
>>                                                                  101,115,
>>
>>  116,61,116,
>>                                                                  114,117,
>>
>>  101,38,111,
>>                                                                  112,101,
>>
>>  110,95,114,
>>                                                                  101,118,
>>
>>  115,61,91,
>>                                                                  34,
>>
>> <<"3070455362">>,
>>                                                                  34,93]
>>
>> A request for status from the couchdb init.d script returns nothing
>> and checking the processes returns:
>> (demo-couchdb)j...@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep cou
>> 29281 pts/2    S+     0:00 grep cou
>> (demo-couchdb)j...@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep beam
>> 29305 pts/2    R+     0:00 grep beam
>>
>> In fact, couch has gone away completely on Machine B.  In fact,
>> couch's death is so quick it can't even say why.
>>
>> Attempts to incrementally replicate after the first failure die at
>> exactly the same place.
>>
>> I can replicate this same database on the same machine from one
>> database to another without issue.  I can dump and reload the database
>> with no problems.
>>
>> I have reported this earlier and no one seemed to have an answer.  Is
>> there a specific issue in JIRA that addresses this problem?  If not,
>> is what I have here enough to start one and should I?
>>
>> Regards,
>>
>> Jeff Hinrichs
>

Re: Replication is Failing is this a known problem?

Reply via email to