On Wed, Nov 3, 2010 at 14:04, Weston Ruter <[email protected]> wrote: > That is a good idea, but the problem with Bible translations in particular > is the issue of overlapping hierarchies: like chapter and verse don't always > fall along same divisions as section and paragraph. So the data model I've > been moving toward is standoff markup, where there is a set of tokens > (words, punctuation) for the entire book and then a set of structures > (paragraphs, verses, etc) that refer to the start token and end token, so > when getting a structure it needs to retrieve all tokens from start to end. > The use of standoff markup and overlapping hierarchies makes your idea of > using sorting buckets not feasible, I don't think. Thanks for the idea > though!
Not sure I agree. My "buckets" are somewhat arbitrary and don't actually have to be mapped to any real structure. The trick is just that by prefixing with a bucket index, you don't have to update all tokens anymore, you only have to update tokens inside the bucket (or the next bucket if you happened to be moving a token to the next bucket). Your standoff thing (I'm not really used to that term, so no clue if I'm using it correctly) would still work, only you now reference tokens by bucket and token index, not just token index. Cheers, Dirkjan
