On 04/07/12 11:26, Brian Leach wrote:
>> All the other groups effectively get 1 added to their number
> Not exactly.
> 
> Sorry to those who already know this, but maybe it's time to go over linear
> hashing in theory ..
> 
> Linear hashing was a system devised by Litwin and originally only for
> in-memory lists. In fact there's some good implementations in C# that
> provide better handling of Dictionary types. Applying it to a file system
> adds some complexity but it's basically the same theory.
> 
> Let's start with a file that has 100 groups initially defined (that's 0
> through 99). That is your minimum starting point and should ensure that it
> never shrinks below that, so it doesn't begin it's life with loads of splits
> right from the start as you populate the file. You would size this similarly
> to the way you size a regular hashed file for your initial content: no point
> making work for yourself (or the database).
> 
> As data gets added, because the content is allocated unevenly, some of that
> load will be in primary and some in overflow: that's just the way of the
> world. No hashing is perfect. Unlike a static file, the overflow can't be
> added to the end of the file as a linked list (* why nobody has done managed
> overflow is beyond me), it has to sit in a separate file.

I don't know what the definition of "badly overflowed" is, but assuming
that a badly overflowed group has two blocks of overflow, then those
file stats seem perfectly okay. As Brian has explained, the distribution
of records is "lumpy" and as a percentage of the file, there aren't many
badly overflowed groups.

You've got roughly 1/3 of groups overflowed - with an 80% split that
doesn't seem at all out of order - on average each group is 80% full so
1/3rd more than 100% full is fine.

You've got (in thousands) one and a half groups badly overflowed out of
eighty-three. That's less than two percent. That's nothing.

As for why no-one has done managed overflow, I think there are various
reasons. The first successful implementation (Prime INFORMATION) didn't
need it. It used a peculiar type of file called a "Segmented Directory"
and while I don't know for certain what PI did, I strongly suspect each
group had its own normal file so if a group overflowed, it just created
a new block at the end of the file. Same with large records, it
allocated a bunch of overflow blocks. This file structure was far more
evident with PI-Open - at the OS level a dynamic file was a OS directory
with lots of numbered files in it.

The UV implementation of "one file for data, one file for overflow" may
be unique to UV. I don't know. What little I know of UD tells me it's
different, and others like QM could well be different again. I wouldn't
actually be surprised if QM is like PI.

Cheers,
Wol
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Reply via email to