Very interesting. I see why 4 and 5 are not in your favor. In the key samples I tested (sequential integers, integers with alpha suffixes, zero padded, etc.), I noted that 4 consistently was worse than 18, though other types would still be better than 18. Sometimes 12 and 13 were just downright "evil." Something to ponder.

Thanks for posting your method of testing.

What I do on live data where running all the combos of HASH.AID isn't feasible (millions of records) is to take a random sample of the file and copy it into something manageable. Then using RESIZE and GROUP.STAT (since the records are usually very "lumpy"), I compare percent std deviations to look for record distribution.

--

Regards,

Clif



On Aug 14, 2004, at 12:16, Rosenberg Ben wrote:

Using a sample of files with no very large records,
or using id-only test files with null @RECORD,
for each filename, do
   {
   CLEAR-FILE DATA HASH.AID.FILE
   for a sample of reasonable moduli, do
      {
      PHANTOM HASH.AID filename 2,18 mod sep
      }
   SORT HASH.AID.FILE BY-DSND LARGEST.GROUP
   to see the worst file types.
   }
-------
u2-users mailing list
[EMAIL PROTECTED]
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to