We have a duplication of the problem from a cleaned installation.  And there 
are some interesting things in the log, but I don't know what they mean, since 
I am not familiar with the internals of CouchDB.

I have attached the couch log. I can send the actual file being replicated, but 
it is about 4Meg.  Too big to make a reasonable attachment.  And I don't think 
it will be of value to this issue.

As to the log, it is broken in three sections (tests), which I will outline:
In all cases, it first deletes all databases, then creates them, and adds a doc 
and attachment.
<1>
The first is where replication was requested via a script HTTP post.
The attachment is uploaded at line 36 - content length 4185522 bytes
The "_replicate" request is PUT at line 49.
Then a bunch of "Minor error in HTTP request" messages appear.  Not sure what 
that means.
Then starting at line 328 to 329, you can see 5 second gaps as it tries to 
"GET" from the "from" database
and a bunch of "New task status" messages are repeated (about 15 of them).
This repeats, showing 5 second pauses until line 740, where it says "POST 
/to/_ensure_full_commit 201"
TOTAL TIME:  16:03:03 to 16:04:16   One minute, thirteen seconds.

<2>
The next test was done using CURL, starting at line 770, it deletes the 
databases, and starts over.
At line 818, there is the PUT request to the _replicator db  (NOTE: This is 
"_replicator", not "_replicate", what is the difference?)
There are only 2 "new task status" messages, and the replication is done by at 
line 911
TOTAL TIME:  16:17:28 to 16:17:30   2 seconds.

<3>
The next test was done using curl as well. It is a repeat of the second test, 
except the replication request was PUT to "_replicate" rather than 
"_replicator", just like the first test.
It starts at line 912, and looks to be identical to everything in test 2.
It took two seconds, there were only 2 "New task status" messages again.

So, the only difference we see is the script used a header that has a different 
user-agent (and had a few other minor differences), and posts a replication 
request JSON which is this:
   {"_id" : "test", 
    "source" : "http://localhost:5984/from";,
    "target" : "http://localhost:5984/to";, 
    "create_target" : false, 
    "continuous" : false }

Which is slightly more comprehensive than the CURL JSON which is just this:
     {"source":"from","target":"to"}

But these differences should not cause the replication to be 30 times longer, 
should they?

Any other ideas why one form of replication takes so much longer?

-Scott




----- Original Message -----
From: Paul Davis <paul.joseph.da...@gmail.com>
To: "user@couchdb.apache.org" <user@couchdb.apache.org>; Scott Weber 
<scotty2...@sbcglobal.net>
Cc: "replicat...@couchdb.apache.org" <replicat...@couchdb.apache.org>
Sent: Friday, January 24, 2014 12:18 PM
Subject: Re: Replication of attachment is extremely slow

If you can duplicate this the first thing I'd look at during a slow
replication is "sudo netstat -tanp tcp" to see if you're maybe bumping
up against open socket limits.

On Fri, Jan 24, 2014 at 7:40 AM, Scott Weber <scotty2...@sbcglobal.net> wrote:
> I appreciate the digging, but in the case of the test file we were using, it 
> is some text that doesn't have dashes or newlines, mixed with image data 
> which are big binary blobs.
>
> So strings that look like mime boundaries aren't likely to be present.
>
> -Scott
>
>
>
>

Reply via email to