On 04/07/12 11:26, Brian Leach wrote: >> All the other groups effectively get 1 added to their number > Not exactly. > > Sorry to those who already know this, but maybe it's time to go over linear > hashing in theory .. > > Linear hashing was a system devised by Litwin and originally only for > in-memory lists. In fact there's some good implementations in C# that > provide better handling of Dictionary types. Applying it to a file system > adds some complexity but it's basically the same theory. > > Let's start with a file that has 100 groups initially defined (that's 0 > through 99). That is your minimum starting point and should ensure that it > never shrinks below that, so it doesn't begin it's life with loads of splits > right from the start as you populate the file. You would size this similarly > to the way you size a regular hashed file for your initial content: no point > making work for yourself (or the database). > > As data gets added, because the content is allocated unevenly, some of that > load will be in primary and some in overflow: that's just the way of the > world. No hashing is perfect. Unlike a static file, the overflow can't be > added to the end of the file as a linked list (* why nobody has done managed > overflow is beyond me), it has to sit in a separate file.
I don't know what the definition of "badly overflowed" is, but assuming that a badly overflowed group has two blocks of overflow, then those file stats seem perfectly okay. As Brian has explained, the distribution of records is "lumpy" and as a percentage of the file, there aren't many badly overflowed groups. You've got roughly 1/3 of groups overflowed - with an 80% split that doesn't seem at all out of order - on average each group is 80% full so 1/3rd more than 100% full is fine. You've got (in thousands) one and a half groups badly overflowed out of eighty-three. That's less than two percent. That's nothing. As for why no-one has done managed overflow, I think there are various reasons. The first successful implementation (Prime INFORMATION) didn't need it. It used a peculiar type of file called a "Segmented Directory" and while I don't know for certain what PI did, I strongly suspect each group had its own normal file so if a group overflowed, it just created a new block at the end of the file. Same with large records, it allocated a bunch of overflow blocks. This file structure was far more evident with PI-Open - at the OS level a dynamic file was a OS directory with lots of numbered files in it. The UV implementation of "one file for data, one file for overflow" may be unique to UV. I don't know. What little I know of UD tells me it's different, and others like QM could well be different again. I wouldn't actually be surprised if QM is like PI. Cheers, Wol _______________________________________________ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users