Use zkCli.sh and look in /accumulo/<accumulo instance id> In 1.4 Accumulo started locking its info in zookeeper down, so you may need to execute the following command :
addauth digest accumulo:SECRET Replace SECRET with the secret from your accumulo-site.xml file. On Thu, Aug 15, 2013 at 12:05 PM, Terry P. <[email protected]> wrote: > Hi Keith, > Many thanks for your detailed reply. I forgot to mention that yes indeed > this is on Accumulo 1.4.2, and it was the write-ahead logs that were the > issue -- partly because two of the tabletservers were not properly shutdown > before the re-IP operation, so recovery may have been needed on them. > > My naivety on Zookeeper certainly hampered the research as well. How does > one "look in zookeeper to see what is going on?" Any pointers would be > really helpful. > > I wish we could go to 1.5 and take advantage of the walogs in HDFS, but no > can do at this point unfortunately. > > > On Thu, Aug 15, 2013 at 10:24 AM, Keith Turner <[email protected]> wrote: > >> >> >> >> On Thu, Aug 15, 2013 at 11:01 AM, Terry P. <[email protected]> wrote: >> >>> Greetings everyone, >>> We had to re-IP our entire cluster recently to change subnetworks, and >>> we essentially lost everything (it was development, so no big deal). >>> However, doing a re-IP operation may be required in actual operational >>> cases, and I'd like to know if it can be done or not so we can note it for >>> the future (as in document "what not to do" to avoid data loss). >>> >>> The issue we had was that after shutting down the cluster, re-IPing all >>> servers, and starting everything back up, the tablets were still assigned >>> to Tabletservers with the old IP addresses, even though all the hostnames >>> were the same. So the system showed 3 Tabletservers, but no tablets, and >>> no entries in the tables where previously there were 400 million. >>> >>> So: >>> >>> A) Does Zookeeper track Tabletservers by IP address only, and not >>> hostname? >>> >> >> It does track by IP address, but not only IP address. Each tablet server >> has an ephemeral node in zookeeper under the IP address. This ehpemeral >> node should go away when the tserver process dies, and then the master will >> assume that tserver is dead. The location of a tablet in the metadata >> table is conceptually <ephemeral node id>+<IP address>, so once that >> ephemeral node goes away the location in metadata table is assumed invalid >> and the tablet is reassigned. If another tserver starts at the same IP, >> then the master can differentiate because the ephemeral node is different. >> >> You can look at the children nodes under a tserver ip in zookeeper. Look >> at the data for the lowest numbered ephemeral node to to get infor about >> who holds the lock for that IP. >> >> >> >> >>> B) If A is true, is there a mechanism to change those entries in >>> Zookeeper so that a re-IP operation could be performed? >>> >> >> A first step would be to look in zookeeper and see what going on with the >> ephemeral nodes. >> >> In Accumulo 1.3 and 1.4 one thing that normally causes problems when >> changing lots of IP addrs is write ahead logs. Tablets point to their >> write ahead logs using the IP address of the logger. This can cause walog >> recovery to fail. In 1.5 walog are stored in HDFS so this not an issue. >> >> >
