How are you planning on storing this table? Seems to me this really is a general purpose de duplication facility, and it's worth it to think about the implications... and how to make it scale.
I think the way to go is a trie, with back pointers, and then reference nodes by pointers to the leaves. Probably want a heap just for the atom table, partly just to keep everything contiguous. If it'll scale up to millions of entries (and I think it has to even if it's just used for xattr names; I don't see how it's possible just to use it for ones you use more than once) there really isn't anything preventing you from using it for anything you want; the only reason you wouldn't want to use it for all your files is the fragmentation wouldn't be worth it when the expected # of duplicates is small - but for small files (for some value of small) - why not? You'd probably want to keep nodes segregated somewhat by depth for cache reasons, if you did start to throw all kinds of stuff in it. Sound completely insane? _______________________________________________ Tux3 mailing list [email protected] http://tux3.org/cgi-bin/mailman/listinfo/tux3
