Hey,
Thanks everyone for the inputs. Silly as it is, we fixed the issue by
restarting one of our couch instances in the cluster
@Robert: re: "For your situation, I believe you will need to update all the
documents in the _dbs database and substitute your old node names for the
new node names. I strongly advise you take a backup of everything you can"
- I did update all the documents in the _dbs database (thats what I meant
by updating cluster metadata) when I added the new nodes using a script.
They all have the documents right always but were disagreeing only on the
_all_docs view. I am not sure why that happened
I will work with the error logs next. Attached is the ruby script I used to
update cluster metadata to move all shards from one node to another.
Regards
Arif
On Fri, Aug 17, 2018 at 12:51 AM, Robert Samuel Newson <[email protected]>
wrote:
> You're quite right, Joan.
>
> B.
>
> > On 17 Aug 2018, at 01:53, Joan Touzet <[email protected]> wrote:
> >
> > Hey everyone,
> >
> > Doesn't 'emfile' mean too many open file handles? Arif, check your file
> handle limit as well as permissions on the files, see:
> >
> > http://docs.couchdb.org/en/stable/maintenance/
> performance.html#maximum-open-file-descriptors-ulimit
> >
> > Finally, we have a very good bit of documentation now that improves on
> Robert's excellent SO post, we recommend using these instructions now
> instead:
> >
> > http://docs.couchdb.org/en/stable/cluster/sharding.html
> >
> > -Joan "yay good documentation" Touzet
> >
> > ----- Original Message -----
> > From: "Robert Samuel Newson" <[email protected]>
> > To: "user" <[email protected]>
> > Sent: Thursday, August 16, 2018 5:03:51 PM
> > Subject: Re: Couch 2.x cluster returning inconsistent _all_docs
> >
> > the word 'emfile' indicates the immediate problem is one of file
> permissions. The user that couchdb is running as is unable to open the
> shards/5... file. So you probably need a recursive chmod/chown session to
> fix up ownership and permissions.
> >
> > Secondly, you have changed the names of 2 nodes. This is ... unwise. All
> clustered databases address their data files using the node names, so what
> you've effectively done is delete 2 of the 3 copies of your databases,
> which would explain the weird inconsistencies.
> >
> > I wrote a stackoverflow post a while ago on how to correctly move an
> individual shard which explains some of the internals:
> https://stackoverflow.com/questions/6676972/moving-a-
> shard-from-one-bigcouch-server-to-another-for-balancing.
> >
> > For your situation, I believe you will need to update all the documents
> in the _dbs database and substitute your old node names for the new node
> names. I strongly advise you take a backup of everything you can.
> >
> > For others observing this thread, I strongly advise against renaming
> nodes like this, it can only lead to trouble, and potentially data loss.
> >
> > B.
> >
> >> On 16 Aug 2018, at 19:25, Arif Khan <[email protected]> wrote:
> >>
> >> emfile
> >
>
>
#!/usr/bin/ruby
require 'net/http'
require 'uri'
require 'json'
require 'optparse'
require 'io/console'
$stdout = File.new( '/var/log/move_shards.log', 'w' )
options = {:node_out => nil, :node_in => nil, :admin_password => nil }
parser = OptionParser.new do|opts|
opts.banner = "Usage: move_shards.rb [options]"
opts.on('-o', '--node_out name1@fqdn/ip', 'Outgoing Node Name') do |node_out|
options[:node_out] = node_out;
end
opts.on('-i', '--node_in name2@fqdn/ip', 'Incoming Node Name') do |node_in|
options[:node_in] = node_in;
end
opts.on('-p', '--password password', 'Couch Admin Password') do |admin_password|
options[:admin_password] = admin_password;
end
opts.on('-h', '--help', 'Displays Help') do
puts opts
exit
end
end
parser.parse!
if options[:node_out] == nil
print 'Enter Outgoing Node: '
options[:node_out] = gets.chomp
end
if options[:node_in] == nil
print 'Enter Incoming Node: '
options[:node_in] = gets.chomp
end
if options[:admin_password] == nil
print 'Enter Admin Password: '
password = STDIN.noecho(&:gets).chomp
print "\n"
else
password = options[:admin_password]
end
admin_username = 'admin'
admin_password = password
node_out = options[:node_out] #"[email protected]"
node_in = options[:node_in] #"[email protected]"
print admin_username
print admin_password
print node_out
print node_in
def http_call (uri_string, username = '', password = '', data_json = {}, verb = 'Get')
uri = URI.parse(uri_string)
http = Net::HTTP.new(uri.host,uri.port)
if verb == 'Put'
request = Net::HTTP::Put.new(uri.request_uri)
request.body = data_json
else
request = Net::HTTP::Get.new(uri.request_uri)
end
request.basic_auth(username, password)
response = http.request(request)
response_json = JSON.parse response.body
end
all_dbs_response_json = http_call "http://127.0.0.1:5984/_all_dbs", admin_username, admin_password
all_dbs_response_json.each do |db|
print "\n#{db}\n"
info_about_db_response_json = http_call "http://127.0.0.1:5986/_dbs/#{db}", admin_username, admin_password
#swap by_node information
if info_about_db_response_json["by_node"].has_key? node_out
info_about_db_response_json["by_node"][node_in] = info_about_db_response_json["by_node"][node_out]
info_about_db_response_json["by_node"].delete(node_out)
else
next
end
#replace outgoing nodes in each shard of by_range dictionary
info_about_db_response_json["by_range"].each do |shard, nodes|
if nodes.include? node_out
nodes.delete(node_out)
nodes.push(node_in)
end
end
new_changelog = ["replace", node_out, node_in]
info_about_db_response_json["changelog"].push(new_changelog)
print JSON.pretty_generate(info_about_db_response_json)
info_about_db_response_json_after = http_call "http://127.0.0.1:5986/_dbs/#{db}", admin_username, admin_password, info_about_db_response_json.to_json, 'Put'
print JSON.pretty_generate(info_about_db_response_json_after)
end
$stdout.close()