Hey,

Thanks everyone for the inputs. Silly as it is, we fixed the issue by
restarting one of our couch instances in the cluster

@Robert: re: "For your situation, I believe you will need to update all the
documents in the _dbs database and substitute your old node names for the
new node names. I strongly advise you take a backup of everything you can"
- I did update all the documents in the _dbs database  (thats what I meant
by updating cluster metadata) when I added the new nodes using a script.
They all have the documents right always but were disagreeing only on the
_all_docs view. I am not sure why that happened

I will work with the error logs next. Attached is the ruby script I used to
update cluster metadata to move all shards from one node to another.

Regards
Arif

On Fri, Aug 17, 2018 at 12:51 AM, Robert Samuel Newson <[email protected]>
wrote:

> You're quite right, Joan.
>
> B.
>
> > On 17 Aug 2018, at 01:53, Joan Touzet <[email protected]> wrote:
> >
> > Hey everyone,
> >
> > Doesn't 'emfile' mean too many open file handles? Arif, check your file
> handle limit as well as permissions on the files, see:
> >
> >    http://docs.couchdb.org/en/stable/maintenance/
> performance.html#maximum-open-file-descriptors-ulimit
> >
> > Finally, we have a very good bit of documentation now that improves on
> Robert's excellent SO post, we recommend using these instructions now
> instead:
> >
> >    http://docs.couchdb.org/en/stable/cluster/sharding.html
> >
> > -Joan "yay good documentation" Touzet
> >
> > ----- Original Message -----
> > From: "Robert Samuel Newson" <[email protected]>
> > To: "user" <[email protected]>
> > Sent: Thursday, August 16, 2018 5:03:51 PM
> > Subject: Re: Couch 2.x cluster returning inconsistent _all_docs
> >
> > the word 'emfile' indicates the immediate problem is one of file
> permissions. The user that couchdb is running as is unable to open the
> shards/5... file. So you probably need a recursive chmod/chown session to
> fix up ownership and permissions.
> >
> > Secondly, you have changed the names of 2 nodes. This is ... unwise. All
> clustered databases address their data files using the node names, so what
> you've effectively done is delete 2 of the 3 copies of your databases,
> which would explain the weird inconsistencies.
> >
> > I wrote a stackoverflow post a while ago on how to correctly move an
> individual shard which explains some of the internals:
> https://stackoverflow.com/questions/6676972/moving-a-
> shard-from-one-bigcouch-server-to-another-for-balancing.
> >
> > For your situation, I believe you will need to update all the documents
> in the _dbs database and substitute your old node names for the new node
> names. I strongly advise you take a backup of everything you can.
> >
> > For others observing this thread, I strongly advise against renaming
> nodes like this, it can only lead to trouble, and potentially data loss.
> >
> > B.
> >
> >> On 16 Aug 2018, at 19:25, Arif Khan <[email protected]> wrote:
> >>
> >> emfile
> >
>
>
#!/usr/bin/ruby
require 'net/http'
require 'uri'
require 'json'

require 'optparse'
require 'io/console'

$stdout = File.new( '/var/log/move_shards.log', 'w' )

options = {:node_out => nil, :node_in => nil, :admin_password => nil  }

parser = OptionParser.new do|opts|
    opts.banner = "Usage: move_shards.rb [options]"
    opts.on('-o', '--node_out name1@fqdn/ip', 'Outgoing Node Name') do |node_out|
        options[:node_out] = node_out;
    end

    opts.on('-i', '--node_in name2@fqdn/ip', 'Incoming Node Name') do |node_in|
        options[:node_in] = node_in;
    end

    opts.on('-p', '--password password', 'Couch Admin Password') do |admin_password|
        options[:admin_password] = admin_password;
    end

    opts.on('-h', '--help', 'Displays Help') do
        puts opts
        exit
    end
end

parser.parse!

if options[:node_out] == nil
    print 'Enter Outgoing Node: '
    options[:node_out] = gets.chomp
end

if options[:node_in] == nil
    print 'Enter Incoming Node: '
    options[:node_in] = gets.chomp
end


if options[:admin_password] == nil
    print 'Enter Admin Password: '
    password = STDIN.noecho(&:gets).chomp
    print "\n"
else
    password = options[:admin_password]
end

admin_username = 'admin'
admin_password = password
node_out = options[:node_out] #"[email protected]"
node_in = options[:node_in] #"[email protected]"


print admin_username
print admin_password
print node_out
print node_in

def http_call (uri_string, username = '', password = '', data_json = {}, verb = 'Get')
    uri = URI.parse(uri_string)
    http = Net::HTTP.new(uri.host,uri.port)
	if verb == 'Put'
	  request = Net::HTTP::Put.new(uri.request_uri)
	  request.body = data_json
	else
	  request = Net::HTTP::Get.new(uri.request_uri)
	end
    request.basic_auth(username, password)
    response = http.request(request)
    response_json = JSON.parse response.body	
end

all_dbs_response_json = http_call "http://127.0.0.1:5984/_all_dbs";, admin_username, admin_password

all_dbs_response_json.each do |db|
 	print "\n#{db}\n"
  info_about_db_response_json = http_call "http://127.0.0.1:5986/_dbs/#{db}";, admin_username, admin_password
  
 #swap by_node information
  if info_about_db_response_json["by_node"].has_key? node_out
     info_about_db_response_json["by_node"][node_in] = info_about_db_response_json["by_node"][node_out]
     info_about_db_response_json["by_node"].delete(node_out)
  else 
  	next
  end

  #replace outgoing nodes in each shard of by_range dictionary
  info_about_db_response_json["by_range"].each do |shard, nodes|
  	if nodes.include? node_out
  	    nodes.delete(node_out)
  	    nodes.push(node_in)
  	end
  end

  new_changelog = ["replace", node_out, node_in]
  info_about_db_response_json["changelog"].push(new_changelog)

  print JSON.pretty_generate(info_about_db_response_json)

  info_about_db_response_json_after = http_call "http://127.0.0.1:5986/_dbs/#{db}";, admin_username, admin_password, info_about_db_response_json.to_json, 'Put'
  print JSON.pretty_generate(info_about_db_response_json_after)
end

$stdout.close()

Reply via email to