I believe the work is done, or nearly done. I was coordinating with Mike Walch off list to prepare the code, before it's officially submitted as a patch to the Apache project. I've asked him to give me a chance to review it before it gets submitted.
If you'd like to take a preview, you can see it in this branch: https://github.com/mikewalch/accumulo/tree/volume-chooser I'd definitely like it to be a blocker for 2.0.0. I think it's an essential feature. On Tue, Dec 20, 2016 at 3:00 PM Jeff Kubina <[email protected]> wrote: > Chris, > > Any status on the patch to Accumulo to allow customizing the HDFS volume > on which the WALs are stored. > > > -- > Jeff Kubina > 410-988-4436 <(410)%20988-4436> > > > On Wed, Nov 2, 2016 at 10:34 PM, Christopher <[email protected]> wrote: > > I'm aware of at least one person who has patched Accumulo to allow > customizing the HDFS volume on which the WALs are stored. This reminds me > that I need to check on the status of that patch. I'm hoping it'll be > contributed soon. > > I'm also curious if it'd make a difference writing to HDFS with the data > nodes mounted with sync, instead of doing a separate sync call. > > On Wed, Nov 2, 2016 at 9:49 PM <[email protected]> wrote: > > Regarding #2 – I think there are two options here: > > > > 1. Modify Accumulo to take advantage of HDFS Heterogeneous Storage > > 2. Modify Accumulo WAL code to support volumes > > > > *From:* Jeff Kubina [mailto:[email protected]] > *Sent:* Wednesday, November 02, 2016 9:02 PM > *To:* [email protected] > *Subject:* Re: New Accumulo Blog Post > > > > Thanks for the blog post, very interesting read. Some questions ... > > > > 1. Are the operations "Writes mutation to tablet servers’ WAL/Sync or > flush tablet servers’ WAL" and "Adds mutations to sorted in memory map of > each tablet." performed by threads in parallel? > > > > 2. Could the latency of hsync-ing the WALs be overcome by modifying > Accumulo to write them to a separate SSD-only HDFS? To maintain data > locality it would require two datanode processes (one for the HDDs and one > for the SSD), running on the same node, which is not hard to do. > > > > > -- Christopher
