Hi, Stack We found this in our production cluster, too. I take a look about the code and found one case which will make the major compact not work.
1. put some rows 2. scan 3. delete 4. scanner didn't close rightly, it will keep the the read point in region's scannerReadPoints. 5. major compact. It can't work because we have a scanner which have small read point than the delete. But if move the region to new rs, the scannerReadPoints will update to the biggest memstoreTs which form sotrefile. So major compact will work. I thought the try...catch module in Scanner.leaseExpired() method needs a finally module to close the region scanner rightly. public void leaseExpired() { RegionScannerHolder rsh = scanners.remove(this.scannerName); if (rsh != null) { RegionScanner s = rsh.s; LOG.info("Scanner " + this.scannerName + " lease expired on region " + s.getRegionInfo().getRegionNameAsString()); try { Region region = regionServer.getRegion(s.getRegionInfo().getRegionName()); if (region != null && region.getCoprocessorHost() != null) { region.getCoprocessorHost().preScannerClose(s); } s.close(); if (region != null && region.getCoprocessorHost() != null) { region.getCoprocessorHost().postScannerClose(s); } } catch (IOException e) { LOG.error("Closing scanner for " + s.getRegionInfo().getRegionNameAsString(), e); } } else { LOG.warn("Scanner " + this.scannerName + " lease expired, but no related" + " scanner found, hence no chance to close that related scanner!"); } } 2016-06-02 2:50 GMT+08:00 Stack <st...@duboce.net>: > On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang <tych...@gmail.com> wrote: > > > Hi, Stack > > > > After moving the region and issue a major compact on that region, its > size > > shrink from 99G down to 24G. So it looks like the region is in a bad > state > > that cannot recover, close/open it fixed the issue. And from the region > > size metric graph, we can see major compaction stop working since March > > 31, so some bug that caused region enter into bad state... Unfortunately, > > we don't have DEBUG enabled and that is the last region that has the > issue, > > it is hard to figure out what is the bug that caused the bad state... > > > > > Interesting. So moving it to another RS make it major-compactable? That > would seem to indicate some state kept in the RS memory is preventing the > major compaction running. Is moving the region a workaround for you until > we figure what it is Tian-Ying? > > St. > > > > > Thanks > > Tian-Ying > > > > On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <tych...@gmail.com> > wrote: > > > > > Hi, Stack > > > > > > Based on the log, the major compaction was run, and it took 5+ hours. > > And > > > I also manually run major_compact from hbase shell explicitly to > verify. > > > > > > I just moved the region to a different RS and issued a major_compact on > > > that region again, let me see if the major compaction can succeed and > > will > > > report back. > > > > > > Thanks > > > Tian-Ying > > > > > > On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote: > > > > > >> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tych...@gmail.com> > > >> wrote: > > >> > > >> > Yes, it is 94.26. By a quick glance, I didn't see any put that is > > >> older > > >> > than the delete marker's TS, which could go as far as about couple > > weeks > > >> > ago since major compaction on it for long time seems. > > >> > > > >> Also it is really strange that if the region is split, then seems > > >> > everything is working as expected. Also we noticed, the same region > > >> > replicated at the slave side is totally normal, i.e. at 20+G.... > > >> > > > >> > > > >> If you move the region to another server, does that work? > > >> > > >> Looking in 0.94 codebase, I see this in Compactor#compact > > >> > > >> > > >> // For major compactions calculate the earliest put timestamp > > >> > > >> // of all involved storefiles. This is used to remove > > >> > > >> // family delete marker during the compaction. > > >> > > >> if (majorCompaction) { > > >> > > >> tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS); > > >> > > >> if (tmp == null) { > > >> > > >> // There's a file with no information, must be an old one > > >> > > >> // assume we have very old puts > > >> > > >> earliestPutTs = HConstants.OLDEST_TIMESTAMP; > > >> > > >> } else { > > >> > > >> earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp)); > > >> > > >> } > > >> > > >> } > > >> > > >> > > >> The above is followed by this log line: > > >> > > >> > > >> if (LOG.isDebugEnabled()) { > > >> > > >> LOG.debug("Compacting " + file + > > >> > > >> ", keycount=" + keyCount + > > >> > > >> ", bloomtype=" + r.getBloomFilterType().toString() + > > >> > > >> ", size=" + StringUtils.humanReadableInt(r.length()) + > > >> > > >> ", encoding=" + r.getHFileReader().getEncodingOnDisk() + > > >> > > >> (majorCompaction? ", earliestPutTs=" + earliestPutTs: "")); > > >> > > >> } > > >> > > >> This prints out earliestPutTs. You see that in the logs? You running > > with > > >> DEBUG? Does the earliest put ts preclude our dropping delete family? > > >> > > >> > > >> Looking more in code, we retain deletes in following circumstances: > > >> > > >> > > >> this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT || > > >> scan > > >> .isRaw(); > > >> > > >> > > >> So, for sure we are running major compaction? > > >> > > >> Otherwise, have to dig in a bit more here.. This stuff is a little > > >> involved. > > >> St.Ack > > >> > > >> > > >> > > >> > > >> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote: > > >> > > > >> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang < > tych...@gmail.com> > > >> > wrote: > > >> > > > > >> > > > Hi, > > >> > > > > > >> > > > We saw a very strange case in one of our production cluster. A > > >> couple > > >> > > > regions cannot get their deleted rows or delete marker removed > > even > > >> > after > > >> > > > major compaction. However when the region triggered split (we > set > > >> 100G > > >> > > for > > >> > > > auto split), the deletion worked. The 100G region becomes two > 10G > > >> > > daughter > > >> > > > regions, and all the delete marker are gone. > > >> > > > > > >> > > > Also, the same region in the slave cluster (through replication) > > >> have > > >> > > > normal size at about 20+G. > > >> > > > > > >> > > > BTW, the delete marker in the regions are mostly deleteFamily if > > it > > >> > > > matters. > > >> > > > > > >> > > > This is really weird. Anyone has any clue for this strange > > behavior? > > >> > > > > > >> > > > Thanks > > >> > > > Tian-Ying > > >> > > > > > >> > > > These 0.94 Tian-Ying? > > >> > > > > >> > > It looks like the DeleteFamily is retained only; do you see > > incidence > > >> > where > > >> > > there may have been versions older than the DeleteFamily that are > > also > > >> > > retained post-major-compaction? > > >> > > > > >> > > St.Ack > > >> > > > > >> > > > > >> > > > > >> > > > A snippet of the HFile generated by the major compaction: > > >> > > > > > >> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738 > > >> > > > V: > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@ > > >> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927 > > >> > > > V: > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >