[ https://issues.apache.org/jira/browse/CASSANDRA-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062237#comment-14062237 ]
Joaquin Casares edited comment on CASSANDRA-7547 at 7/15/14 4:21 PM: --------------------------------------------------------------------- Before we go any further with the using the DataStax reflector, you should consider its initial purpose was to find a simple way for individual nodes to cluster together on startup without initially knowing anything about each other. The current reflector is built with a short-term memory of about 10 minutes to get the list of seeds. If a node is slow to boot and comes in on the 11th minute, it will never know of its peers. If a pre-chosen seed node is slow to boot, the nodes may never properly cluster together. This is important because the seed provider is pinged multiple times during the lifetime of the cluster, mainly during periods of topological changes: removal, bootstrap, replace, etc. If these happen outside of a window of 10 minutes for all nodes, you'll get an empty or incomplete list of seeds. Taking the concept of using "a" reflector may be worth doing, but keep these things in mind: * ensure you use a private service with long-term memory, * you should rely on a reflector for assistance in configuring the seed list, not the seed-provider directly, * always assume the service can and will go down so write to disk appropriately, perhaps conf/seed-list.txt, * you must account for topological changes that will occur in long running clusters, * and all seed lists on each node should be identical. The last point is probably the hardest. I'm not sure if this infrastructure fits best inside of Cassandra or as external tools. However, in order to have more control of when seed lists get updated, instead of waiting for Cassandra services to kick in, external tools will probably be your best option. I hope this helps you build what you have in mind. Cheers! was (Author: j.casares): Before we go any further with the using the DataStax reflector, you should consider its initial purpose was to find a simple way for individual nodes to cluster together on startup without initially knowing anything about each other. The current reflector is built with a short-term memory of about 10 minutes to get the list of seeds. If a node is slow to boot and comes in on the 11th minute, it will never know of its peers. If a pre-chosen seed node is slow to boot, the nodes may never properly cluster together. This is important because the seed provider is pinged multiple times during the lifetime of the cluster, mainly during periods of topological changes: removal, bootstrap, replace, etc. If these happen outside of a window of 10 minutes for all nodes, you'll get an empty or incomplete list of seeds. Taking the concept of using "a" reflector may be worth doing, but keep these things in mind: * ensure you use a private service with long-term memory, * you should rely on a reflector for assistance in configuring the seed list, not the seed-provider directly, * always assume the service can and will go down so write to disk appropriately, perhaps conf/seed-list.txt, * you must account for topological changes that will occur in long running clusters, * and all seed lists on each node should be identical. The last point is probably the hardest. I'm not sure if this infrastructure fits best inside of Cassandra or as external tools. I hope this helps you build what you have in mind. Cheers! > EC2 seed provider using DataStax Reflector > ------------------------------------------ > > Key: CASSANDRA-7547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7547 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Pekka Enberg > Priority: Minor > Attachments: 0001-EC2-seed-provider.patch > > > This is a request for comments. I am using this to build our EC2 AMIs but I > thought I'd ask if this makes sense as a generic feature for Cassandra. > Cassandra cluster auto-configuration on EC2 uses the Datastax reflector > service for discovering seed nodes. Instead of relying on external scripts, > this patch implements EC2 seed provider that uses the Datastax reflector > service. > This is particularly useful for EC2 AMIs that don't include a complete > userspace (such as those built with OSv) where we ideally want to push as > much configuration to the application itself. -- This message was sent by Atlassian JIRA (v6.2#6252)