Hi,

i've submitted a proposal for gsoc on melange, but i'm reposting it here
since it looks easier to get feedback than on the web interface.

HAMMER2
-------

HAMMER2 is a file system developed by Matthew Dillon for DragonFlyBSD.
The main purpose of this new filesystem is to support replication
clustering and the core design is heavily influenced by this goal, but
the filesystem does support a lot of features which are useful in a
single node context. The goal of this proposal is to port a *subset* of
HAMMER2 to OpenBSD and stabilize the port in a single node context.
Supporting clustering is out of scope of this project.

Its primary features are copy on write writable snapshots, on-the-fly
compression (currently only LZ4 and zeros suppression). The data
integrity is checked with per block CRC. Offline (safe) and online
(subject to hash collision issues) block de-duplication and multiple
disks support are planned but not yet implemented. One of the strength
of this filesystem is that despite the huge number of features it
implements the code is still comparatively simple with regards to the
others existing filesystem in the segment (the number of lines of code
is in the same order than UFS at this time)

About porting
-------------

One might consider that it is too early to port the filesystem now and
it is better to wait for it to be finished. But porting early before the
filesystem is used in production can be a win too because it is still
possible to make architectural changes to improve portability. Moreover,
porting the filesystem to another operating system might help to attract
new developers to work on specific part of the filesystem like
deduplication or the copy mechanism and speed up its development.

The core filesystem is here and mostly done. There is something to work
on which is not subject to change drastically and void the porting work.

Goals
-----

The goal is to have a working filesystem at the end of the summer, and
be able to create, mount, read/write without panic on a single master
filesystem.

Non goals
-----------

* boot loader support
* clustering support. This is a major task on its own, and needs porting
of more subsystems in the kernel. It is also the less advanced part of
the filesystem, currently in development.

Planned work
------------

 * Weeks 1-2

Port the core of the filesystem. The chain API is the main in-memory
data structure of HAMMER2. It is a COW tree of chains representing the
toplogical structure of the files in the filesystem. A chain can
represent an inode, a directory entry, or an indirect block. This is the
most portable part of the filesystem since it doesn't depend on any
external API. The main task is converting the locking directives from
DragonFly to OpenBSD and deal with the slight kernel API differences,
like malloc.

 * Weeks 3

Hammer2 has an abstraction on top of the kernel block io layer.
Alongside of the chain API, the block io abstraction is another major
part of the filesystem. It is responsible to handle buffer mapping from
the buffer cache and read/write data to disk at the request of the chain
frontend code.

The third week would be dedicated to the porting of the io layer and the
io clustering layer.

Like with most kernel code, there is no test suite, but spending the
time to write a few tests and get this code and the chain API build and
run standalone might worth it for the next part of the project. Together
they are the most complex code in the filesystem and if it works, then
remaining bugs will be easier to track down.

 * Weeks 4

Port the ioctl api, and the bulkfree scan. The bulkfree scan is the code
responsible to garbage collect the un-referenced blocks.

 * Week 5

Port the userspace tools. At this point, it should be possible to create
a filesystem

 * Week 6-7-8

Convert the frontend to the openBSD vfs API. DragonFly VFS has diverged
a lot from earlier BSD VFS, with substantial changes in the locking and
semantic of the namecache/vnode locking and interactions. The frontend
code which contains the vnode operations and the vfs operations is about
5kloc. This is a major task.

At this point, it should be possible to use the filesystem.

 * Remaining time

I'm not sure i'm able to plan this long. The first thing to do is to
test the filesystem in depth, and fix the issues which arise. It's not
possible to plan the time it'll need but i suspect it will be quite some
work. After that, if i have more time, there are some possibility.
Probably the best think is to track down one of the known bug of hammer2
and try to solve it, for instance the issue with hardlink (if matt
dillon has not solved it at this point). Another possibility is working
on a small feature for hammer2, like deduplication, but i'd like to be
have a usable filesystem before trying to add something new.

Thanks for reading,

Joris

Reply via email to