Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

Josh Elser Tue, 21 Feb 2017 07:36:06 -0800

... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
communicating with ZooKeeper, will retry
SessionExpiredException: KeeperErrorCode = Session expired for
/accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory


There can be a number of causes for this, but here are the most likely ones.

* JVM gc pauses
* ZooKeeper max client connections
* Operating System/Hardware-level pauses

The former should be noticeable by the Accumulo log. There is a daemon
running which watches for pauses that happen and then reports them. If
this is happening, you might have to give the process some more Java
heap, tweak your CMS/G1 parameters, etc.

For maxClientConnections, see
https://community.hortonworks.com/articles/51191/understanding-apache-zookeeper-connection-rate-lim.html

For the latter, swappiness is the most likely candidate (assuming this
is hopping across different physical nodes), as are "transparent huge
pages". If it is limited to a single host, things like bad NICs, hard
drives, and other hardware issues might be a source of slowness.

On Mon, Feb 20, 2017 at 10:18 PM, Dickson, Matt MR
<[email protected]> wrote:
> UNOFFICIAL
>
> It looks like an issue with one of the metadata table tablets. On startup
> the server that hosts a particular metadata tablet gets scanned by all other
> tablet servers in the cluster.  This then crashes that tablet server with an
> error in the tserver log;
>
> ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
> communicating with ZooKeeper, will retry
> SessionExpiredException: KeeperErrorCode = Session expired for
> /accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory
>
> That metadata table tablet is then transferred to another host which then
> fails also, and so on.
>
> While the server is hosting this metadata tablet, we see the following log
> statement from all tserver.logs in the cluster:
>
> .... [impl.ThriftScanner] DEBUG: Scan failed, thrift error
> org.apache.thrift.transport.TTransportException  null
> (!0;1vm\\;125.323.233.23::2016103<,server.com.org:9997,2342423df12341d)
> Hope that helps complete the picture.
>
>
> ________________________________
> From: Christopher [mailto:[email protected]]
> Sent: Tuesday, 21 February 2017 13:17
>
> To: [email protected]
> Subject: Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
>
> Removing them is probably a bad idea. The root table entries correspond to
> split points in the metadata table. There is no need for the tables which
> existed when the metadata table split to still exist for this to continue to
> act as a valid split point.
>
> Would need to see the exception stack trace, or at least an error message,
> to troubleshoot the shell scanning error you saw.
>
>
> On Mon, Feb 20, 2017, 20:00 Dickson, Matt MR <[email protected]>
> wrote:
>>
>> UNOFFICIAL
>>
>> In case it is ok to remove these from the root table, how can I scan the
>> root table for rows with a rowid starting with !0;1vm?
>>
>> Running "scan -b !0;1vm" throws an exception and exits the shell.
>>
>>
>> -----Original Message-----
>> From: Dickson, Matt MR [mailto:[email protected]]
>> Sent: Tuesday, 21 February 2017 09:30
>> To: '[email protected]'
>> Subject: RE: accumulo.root invalid table reference [SEC=UNOFFICIAL]
>>
>> UNOFFICIAL
>>
>>
>> Does that mean I should have entries for 1vm in the metadata table
>> corresponding to the root table?
>>
>> We are running 1.6.5
>>
>>
>> -----Original Message-----
>> From: Josh Elser [mailto:[email protected]]
>> Sent: Tuesday, 21 February 2017 09:22
>> To: [email protected]
>> Subject: Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
>>
>> The root table should only reference the tablets in the metadata table.
>> It's a hierarchy: like metadata is for the user tables, root is for the
>> metadata table.
>>
>> What version are ya running, Matt?
>>
>> Dickson, Matt MR wrote:
>> > *UNOFFICIAL*
>> >
>> > I have a situation where all tablet servers are progressively being
>> > declared dead. From the logs the tservers report errors like:
>> > 2017-02-.... DEBUG: Scan failed thrift error
>> > org.apache.thrift.trasport.TTransportException null
>> > (!0;1vm\\125.323.233.23::2016103<,server.com.org:9997,2342423df12341d)
>> > 1vm was a table id that was deleted several months ago so it appears
>> > there is some invalid reference somewhere.
>> > Scanning the metadata table "scan -b 1vm" returns no rows returned for
>> > 1vm.
>> > A scan of the accumulo.root table returns approximately 15 rows that
>> > start with; !0:1vm;<i/p addr>/::2016103 /blah/ // How are the root
>> > table entries used and would it be safe to remove these entries since
>> > they reference a deleted table?
>> > Thanks in advance,
>> > Matt
>> > //
>
> --
> Christopher

Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

Reply via email to