Hi Nick, Thank you for the follow-up investigation and questions.
I am in the process of rebuilding my software stack and will try to replicate data using the very same CouchDB 3.1.2 version + Erlang 22. > ... If you get a chance to find and run a remsh script check the output of : Regarding these erlang comments that you suggested to run, this is the output on the Couchdb 3.1 + erlang 22: Eshell V10.7.2.11 (abort with ^G) 1> crypto:info_lib(). [{<<"OpenSSL">>,268443839, <<"OpenSSL 1.0.2k-fips 26 Jan 2017">>}] 2> ssl:versions(). [{ssl_app,"9.2"}, {supported,['tlsv1.2']}, {supported_dtls,['dtlsv1.2']}, {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]}, {available_dtls,['dtlsv1.2',dtlsv1]}] while the "old" and fully functional Couchdb 1.6.1 + erlang 16 gives me this: Eshell V5.10.4 (abort with ^G) 3> ssl:versions(). [{ssl_app,"5.3.3"}, {supported,['tlsv1.2','tlsv1.1',tlsv1,sslv3]}, {available,['tlsv1.2','tlsv1.1',tlsv1,sslv3]}] 4> crypto:info_lib(). [{<<"OpenSSL">>,268443839, <<"OpenSSL 1.0.2k-fips 26 Jan 2017">>}] > * Another potential issue is that the curl script quotes the parameters with a single quote in: this shouldn't be a problem. I actually started running those commands without the env variable, but decided to update my notes when I was copying them over gist. > The logs don't show any stack traces besides the one you indicated in the initial email? Anything with a module name and a line number there is absolute nothing around that single line of error. Changing log level to debug doesn't help either. Thank you for these suggestions. I should be back later with some news on 3.1.2 <--> 3.1.2 bidirectional replication. Thanks, Alan. On Sun, Feb 27, 2022 at 4:40 PM Nick Vatamaniuc <vatam...@gmail.com> wrote: > Thanks for the script, Alan. > > I had tried to set up a basic replication between localhost endpoint > on Erlang 22 with 3.2.1 release and that seems to work: > https://gist.github.com/nickva/5a89198c62fdd3ec97693c87833d5738 > > Looking at the differences between our setups I noticed a few things: > > * I haven't tried TLS on the endpoints. Wonder if that's the cause. > Would you be able to try it locally (or via a VPN) without TLS. > Sometimes it's possible to install or build an Erlang release without > crypto support and it only manifests itself when trying to use any of > that functionality at runtime. If you get a chance to find and run a > remsh script check the output of : > > crypto:info_lib(). > [{<<"OpenSSL">>,269488239, > <<"OpenSSL 1.1.1f 31 Mar 2020">>}] > > ssl:versions(). > [{ssl_app,"9.2"}, > {supported,['tlsv1.2']}, > {supported_dtls,['dtlsv1.2']}, > {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]}, > {available_dtls,['dtlsv1.2',dtlsv1]}] > > That would indicate you have the crypto application installed and > linked to the openssl library. > > * Another potential issue is that the curl script quotes the > parameters with a single quote in: > > curl -X POST http://$USERPASS@localhost:5984/_replicator -d > '{"source":"http://$USERPASS@localhost:5984/workqueue_inbox/", > "target":"https://$REMOTEHOST/couchdb/workqueue/", "continuous":true}' > -H "Content-Type: application/json" > > That would make the target the literal > `https://$REMOTEHOST/couchdb/workqueue/` string without substituting > the $REMOTEHOST with its value. That's probably not the reason here > but thought I'd mention it just in case. > > * `https://$REMOTEHOST/couchdb/workqueue`. I could see the db / url > parser being confused by the url path there as the path > $REMOTEHOST/couchdb/workqueue could be split up as a database > path=$REMOTEHOST/couchdb and then workqueue would be the document, but > in this case the workqueue is the database actually. Would you be able > to test a setup where the URL path looks like > http://domain.name.ext/dbname for an endpoint? > > The logs don't show any stack traces besides the one you indicated in > the initial email? Anything with a module name and a line number > perhaps. > > Thanks, > -Nick > > > > > > On Sat, Feb 26, 2022 at 4:24 PM Alan Malta <alanma...@gmail.com> wrote: > > > > Hi Nick, > > > > Thank you for your prompt response. > > > > Yes, I confirm that CouchDB 3.1.2 is running with Erlang 22; and that > user > > and password only have basic chars a-z. > > > > I wiped out all my setup, started from scratch and managed to reproduce > > this replication issue with the following set > > of commands: > > https://gist.github.com/amaltaro/67bd133c519300fb82dd0cad372cf1a0 > > > > while reproducing it, I defined only one way replication. However, my > > previous setup had it bi-directional and both > > of them were in a failed state. I also added some extra checks and > > information in the gist above, in case it turns out > > to be helpful. > > > > I haven't yet tried to replicate data among two instances running the > same > > version. Reason is, during this migration, > > I believe it will be impossible to swap all my services to the new > CouchDB > > version, so there should be a period of > > time (around a month) where I will need to keep this hybrid setup. > > > > Thank you again! > > Alan. > > > > On Sat, Feb 26, 2022 at 12:26 PM Nick Vatamaniuc <vatam...@gmail.com> > wrote: > > > > > Hi Alan, > > > > > > Thanks for reaching out. > > > > > > It looks like CouchDB had failed to parse the replication document, > > > and couldn't turn it into a proper replication job. > > > > > > The 'undef' error could suggest running on an unsupported version of > > > Erlang. It's a generic "this function doesn't exist" error in Erlang. > > > Are you running on at least Erlang 20? > > > > > > Does the target url have any unusual characters in it, or something > > > that might cause parsing errors (say, ':' or '@' characters for > > > example). > > > > > > Would it be possible to have an example script which fails. Ideally, a > > > set of curl commands creating dbs, then the replication job using > > > similar parameters you had? > > > > > > Cheers, > > > -Nick > > > > > > On Sat, Feb 26, 2022 at 9:29 AM Alan Malta <alanma...@gmail.com> > wrote: > > > > > > > > Hi everyone, > > > > > > > > after a delay of many years to migrate to (almost) the latest CouchDB > > > > version, I started working with CouchDB 3.1.2. > > > > > > > > My tests with replication to/from the same node/localhost have been > > > > successful. But now that I am trying multiple push/pull replications > > > with a > > > > remote host, they get into a "failed" state. > > > > > > > > I just learned about the "_scheduler/jobs" API - and I am likely > missing > > > > some crucial knowledge here - and when I compare it against the > documents > > > > in the "_replicator" database, I see an inconsistent definition for > > > either > > > > the source or the target database. > > > > For instance, the "_scheduler/jobs" gives me the following output > for one > > > > of the replications: > > > > > > > > > > > > {"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":" > > > http://admin: > > > > > > > > *****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"}, > > > > while the "_replicator" db lists this document as: > > > > > > > > > > > > {"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":" > > > http://admin: > > > > *****@localhost:5984/my_db_name","target":" > > > > https://alanblah.blah.blah/couchdb/wmstats > > > > > > > > ","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}}, > > > > in short, the "target" parameter is defined as null in the "jobs" > output. > > > > Is it because the replication failed somehow? > > > > > > > > Just in case, this is the only error I see in the couch log regarding > > > that > > > > replication - on the node that triggered the replication: > > > > > > > > [error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0> > -------- > > > > Error processing replication doc `87463eb82b3e1dcd7a3178276800026e` > from > > > > `shards/00000000-7fffffff/_replicator.1645882154`: {error,undef} > > > > > > > > I also wonder if the replication protocol is compatible among > different > > > > releases of CouchDB? In my case, target is still on the super old > version > > > > 1.6.1 while source is on 3.1.2 > > > > > > > > Thank you very much for any help that you can provide. > > > > Best, > > > > Alan. > > > >