[ https://issues.apache.org/jira/browse/ZOOKEEPER-464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855183#action_12855183 ]
Erwin Tam commented on ZOOKEEPER-464: ------------------------------------- There was an intermittent problem with the ledger delete junit tests prior to the last patch uploaded (which resolved it). I'll document the bug and the fix for that here. The ledger delete junit tests were failing intermittently and it was related to an issue I saw earlier when I was running unit tests with a very small entry log limit size (2K). When the entry logs roll over, we create a new one by first writing the "BKLO" 1024 byte header to the beginning of the file. The problem is, this byte buffer object is statically defined. In our junit tests, we have multiple Bookie servers (and thus EntryLogger instances) in the same jvm. If more than one EntryLogger is rolling over its current log and writing the next one, they are accessing the same entryLog file header buffer. This creates problems since the static header isn't accessed in a synchronized way. This header byte buffer is cleared first before writing it to the log file. Since it is static, one thread could clear it first, then another thread (from a second Bookie server) clears it at the same time. The first thread writes the header but when it is done, the header's byte buffer's internal pointers have it pointing to the end and aren't reset. The second thread will then be reading the header buffer that has not been cleared/reset. What ends up happening is the entry logs in the second Bookie are created without the header. When we're reading through those files later on to figure out which ledgers make it up, it'll read incorrect values and try to allocate byte buffers based on an incorrect length segment (basically reading in junk random bytes). This creates the java heap space error. The fix is simple and is to just make this logfile header a non-static variable, initializing it in the EntryLogger constructor. In practice, we shouldn't be running multiple Bookies within the same jvm so we wouldn't run into this problem. > Need procedure to garbage collect ledgers > ----------------------------------------- > > Key: ZOOKEEPER-464 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-464 > Project: Zookeeper > Issue Type: New Feature > Components: contrib-bookkeeper > Reporter: Flavio Paiva Junqueira > Assignee: Erwin Tam > Fix For: 3.4.0 > > Attachments: zookeeper-464-log.txt, ZOOKEEPER-464.patch, > ZOOKEEPER-464.patch, ZOOKEEPER-464.patch > > > An application using BookKeeper is likely to use a large number of ledgers > over time. Such an application might not need all ledgers created over time > and might want to delete some of these ledgers to free up some space on > bookies. The idea of this jira is to implement a procedure that enables an > application to garbage-collect unwanted ledgers. > To garbage-collect a ledger, we need to delete the ledger metadata on > ZooKeeper, and delete the ledger data on corresponding bookies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.