It should be safe to merge on the metadata table. That was one of the goals
of moving the root tablet into its own table. I'm pretty sure we have a
build test to ensure it works.

On Tue, Feb 21, 2017, 18:22 Dickson, Matt MR <[email protected]>
wrote:

> *UNOFFICIAL*
> Firstly, thankyou for your advice its been very helpful.
>
> Increasing the tablet server memory has allowed the metadata table to come
> online.  From using the rfile-info and looking at the splits for the
> metadata table it appears that all the metadata table entries are in one
> tablet.  All tablet servers then query the one node hosting that tablet.
>
> I suspect the cause of this was a poorly designed table that at one point
> the Accumulo gui reported 1.02T tablets for.  We've subsequently deleted
> that table but it might be that there were so many entries in the metadata
> table that all splits on it were due to this massive table that had the
> table id 1vm.
>
> To rectify this, is it safe to run a merge on the metadata table to force
> it to redistribute?
>
> ------------------------------
> *From:* Michael Wall [mailto:[email protected]]
> *Sent:* Wednesday, 22 February 2017 02:44
>
> *To:* [email protected]
> *Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
> Matt,
>
> If I am reading this correctly, you have a tablet that is being loading
> onto a tserver.  That tserver dies, so the tablet is then assigned to
> another tablet.  While the tablet is being loading, that tserver dies and
> so on.  Is that correct?
>
> Can you identify the tablet that is bouncing around?  If so, try using
> rfile-info -d to inspect the rfiles associated with that tablet.  Also look
> at the rfiles that compose that tablet to see if anything sticks out.
>
> Any logs that would help explain why the tablet server is dying?  Can you
> increase the memory of the tserver?
>
> Mike
>
> On Tue, Feb 21, 2017 at 10:35 AM Josh Elser <[email protected]> wrote:
>
> ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
> communicating with ZooKeeper, will retry
> SessionExpiredException: KeeperErrorCode = Session expired for
> /accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory
>
> There can be a number of causes for this, but here are the most likely
> ones.
>
> * JVM gc pauses
> * ZooKeeper max client connections
> * Operating System/Hardware-level pauses
>
> The former should be noticeable by the Accumulo log. There is a daemon
> running which watches for pauses that happen and then reports them. If
> this is happening, you might have to give the process some more Java
> heap, tweak your CMS/G1 parameters, etc.
>
> For maxClientConnections, see
>
> https://community.hortonworks.com/articles/51191/understanding-apache-zookeeper-connection-rate-lim.html
>
> For the latter, swappiness is the most likely candidate (assuming this
> is hopping across different physical nodes), as are "transparent huge
> pages". If it is limited to a single host, things like bad NICs, hard
> drives, and other hardware issues might be a source of slowness.
>
> On Mon, Feb 20, 2017 at 10:18 PM, Dickson, Matt MR
> <[email protected]> wrote:
> > UNOFFICIAL
> >
> > It looks like an issue with one of the metadata table tablets. On startup
> > the server that hosts a particular metadata tablet gets scanned by all
> other
> > tablet servers in the cluster.  This then crashes that tablet server
> with an
> > error in the tserver log;
> >
> > ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
> > communicating with ZooKeeper, will retry
> > SessionExpiredException: KeeperErrorCode = Session expired for
> >
> /accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory
> >
> > That metadata table tablet is then transferred to another host which then
> > fails also, and so on.
> >
> > While the server is hosting this metadata tablet, we see the following
> log
> > statement from all tserver.logs in the cluster:
> >
> > .... [impl.ThriftScanner] DEBUG: Scan failed, thrift error
> > org.apache.thrift.transport.TTransportException  null
> > (!0;1vm\\;125.323.233.23::2016103<,server.com.org:9997,2342423df12341d)
> > Hope that helps complete the picture.
> >
> >
> > ________________________________
> > From: Christopher [mailto:[email protected]]
> > Sent: Tuesday, 21 February 2017 13:17
> >
> > To: [email protected]
> > Subject: Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
> >
> > Removing them is probably a bad idea. The root table entries correspond
> to
> > split points in the metadata table. There is no need for the tables which
> > existed when the metadata table split to still exist for this to
> continue to
> > act as a valid split point.
> >
> > Would need to see the exception stack trace, or at least an error
> message,
> > to troubleshoot the shell scanning error you saw.
> >
> >
> > On Mon, Feb 20, 2017, 20:00 Dickson, Matt MR <
> [email protected]>
> > wrote:
> >>
> >> UNOFFICIAL
> >>
> >> In case it is ok to remove these from the root table, how can I scan the
> >> root table for rows with a rowid starting with !0;1vm?
> >>
> >> Running "scan -b !0;1vm" throws an exception and exits the shell.
> >>
> >>
> >> -----Original Message-----
> >> From: Dickson, Matt MR [mailto:[email protected]]
> >> Sent: Tuesday, 21 February 2017 09:30
> >> To: '[email protected]'
> >> Subject: RE: accumulo.root invalid table reference [SEC=UNOFFICIAL]
> >>
> >> UNOFFICIAL
> >>
> >>
> >> Does that mean I should have entries for 1vm in the metadata table
> >> corresponding to the root table?
> >>
> >> We are running 1.6.5
> >>
> >>
> >> -----Original Message-----
> >> From: Josh Elser [mailto:[email protected]]
> >> Sent: Tuesday, 21 February 2017 09:22
> >> To: [email protected]
> >> Subject: Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
> >>
> >> The root table should only reference the tablets in the metadata table.
> >> It's a hierarchy: like metadata is for the user tables, root is for the
> >> metadata table.
> >>
> >> What version are ya running, Matt?
> >>
> >> Dickson, Matt MR wrote:
> >> > *UNOFFICIAL*
> >> >
> >> > I have a situation where all tablet servers are progressively being
> >> > declared dead. From the logs the tservers report errors like:
> >> > 2017-02-.... DEBUG: Scan failed thrift error
> >> > org.apache.thrift.trasport.TTransportException null
> >> > (!0;1vm\\125.323.233.23::2016103<,server.com.org:9997
> ,2342423df12341d)
> >> > 1vm was a table id that was deleted several months ago so it appears
> >> > there is some invalid reference somewhere.
> >> > Scanning the metadata table "scan -b 1vm" returns no rows returned for
> >> > 1vm.
> >> > A scan of the accumulo.root table returns approximately 15 rows that
> >> > start with; !0:1vm;<i/p addr>/::2016103 /blah/ // How are the root
> >> > table entries used and would it be safe to remove these entries since
> >> > they reference a deleted table?
> >> > Thanks in advance,
> >> > Matt
> >> > //
> >
> > --
> > Christopher
>
> --
Christopher

Reply via email to