I should have said "60% more disk records", to be clear.

-----Original Message-----
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Rick Nuckolls
Sent: Tuesday, July 03, 2012 2:24 PM
To: 'U2 Users List'
Subject: Re: [U2] RESIZE - dynamic files

But the total size of your file is up 60%.  Reading in 60% more records in a 
full select of the file is going to be much slower than a few more overflows.


-----Original Message-----
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:15 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Dan,

I changed the MINIMUM.MODULUS to the value of 200003 as you suggested and my 
Actual Load has really gone down (as well as overflow). See below for the 
results:

File name ..................   GENACCTRN_POSTED
Pathname ...................   GENACCTRN_POSTED
File type ..................   DYNAMIC
File style and revision ....   32BIT Revision 12
Hashing Algorithm ..........   GENERAL
No. of groups (modulus) ....   200003 current ( minimum 200003, 5263 empty,
                                            3957 overflowed, 207 badly )
Number of records ..........   1290469
Large record size ..........   3267 bytes
Number of large records ....   180
Group size .................   4096 bytes
Load factors ...............   90% (split), 50% (merge) and 37% (actual)
Total size .................   836235264 bytes
Total size of record data ..   287394719 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...............   527323832 bytes
Total space for records ....   836227072 bytes

My overflow is now @ 2%
My Load is @ 37% (actual)

granted my empty groups are now up to almost 3% but I hope that won't be a big 
factor. How does this look?

Chris


> From: dangf...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 16:57:34 -0400
> Subject: Re: [U2] RESIZE - dynamic files
>
>
> One rule of thumb is to make sure that you have an average of 10 or less 
> items in each group. Going by that, you'd want a minimum mod of 130k or more. 
> I've also noticed that files approach the "sweet spot" for minimizing 
> overflow without having excessive empty groups when the total size is pretty 
> nearly twice the data size.
>
> The goal can vary according to your situation. I'm personally not all that 
> afraid of making the modulus a little too large, as overflow is a pretty bad 
> performance hit (overflow means at least two disk reads to retrieve your 
> data, "badly" means at least 2 extra disk reads, and I've seen files where 
> that was thousands (this file isn't that bad, but 20% of your data is forcing 
> at least one extra disk read). Empty groups contribute to overhead on a 
> sequential search, so you'd want to consider how often you do a sequential 
> search on a file - usually, that's a pretty inefficient way to retrieve data, 
> but, again, your mileage may vary.
>
> To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
> than 0.2%. I'd be tempted to go to 200003 as a minimum Mod, just to see what 
> it looks like there. That'll give you an average of 6 records per group, not 
> unreasonably shallow, and it's likely to be a while before you have to resize 
> again.
>
> > From: cjausti...@hotmail.com
> > To: u2-users@listserver.u2ug.org
> > Date: Tue, 3 Jul 2012 15:23:23 -0500
> > Subject: Re: [U2] RESIZE - dynamic files
> >
> >
> > I guess what I need to know is what's an acceptable % of overflow for a 
> > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
> > the calculated min modulus)
> > I'm still left with ~ 20% of overflow (see below). Is 20% overflow 
> > considered acceptable on average or should I keep tinkering with it to 
> > reach a lower overflow %?
> >
> > Correct me if I'm wrong but it seems the goal here is to REDUCE the 
> > overflow % while not creating too many modulus (groups).
> >
> > Chris
> >
> >
> > File name ..................   GENACCTRN_POSTED
> > Pathname ...................   GENACCTRN_POSTED
> > File type ..................   DYNAMIC
> > File style and revision ....   32BIT Revision 12
> > Hashing Algorithm ..........   GENERAL
> > No. of groups (modulus) ....   105715 current ( minimum 103889, 114 empty,
> >                                             21092 overflowed, 1452 badly )
> > Number of records ..........   1290469
> > Large record size ..........   3267 bytes
> > Number of large records ....   180
> > Group size .................   4096 bytes
> > Load factors ...............   90% (split), 50% (merge) and 70% (actual)
> > Total size .................   522260480 bytes
> > Total size of record data ..   287400239 bytes
> > Total size of record IDs ...   21508521 bytes
> > Unused space ...............   213343528 bytes
> > Total space for records ....   522252288 bytes
> >
> > > From: r...@lynden.com
> > > To: u2-users@listserver.u2ug.org
> > > Date: Tue, 3 Jul 2012 13:10:43 -0700
> > > Subject: Re: [U2] RESIZE - dynamic files
> > >
> > > The split load is not affecting anything here, since it is more than the 
> > > actual load.  What your overflow suggests is that you lower the 
> > > split.load value to 70$% or below.  You could go ahead and set the 
> > > merge.load to an arbitrarily low number ("1"), and it will probably never 
> > > do a merge, which would be the same as specifying a minimum.modulus equal 
> > > to "as large as it ever gets".  The exception to this is during file 
> > > creation & clear.file,  when the minimum.modulus value determines the 
> > > initial disk allocation.  Short of going to an arbitrarily large 
> > > minimum.modulus, and a very low split.load, you are going to have some 
> > > overflow (unless you have sequential keys & like sized records).
> > >
> > > -Rick
> > >
> > > -----Original Message-----
> > > From: u2-users-boun...@listserver.u2ug.org 
> > > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> > > Sent: Tuesday, July 03, 2012 12:54 PM
> > > To: u2-users@listserver.u2ug.org
> > > Subject: Re: [U2] RESIZE - dynamic files
> > >
> > >
> > > Using the formula below, and changing the split to 90% I get the 
> > > following:
> > >
> > > File name ..................   GENACCTRN_POSTED
> > > Pathname ...................   GENACCTRN_POSTED
> > > File type ..................   DYNAMIC
> > > File style and revision ....   32BIT Revision 12
> > > Hashing Algorithm ..........   GENERAL
> > > No. of groups (modulus) ....   103889 current ( minimum 103889, 114 empty,
> > >                                             22249 overflowed, 1764 badly )
> > > Number of records ..........   1290469
> > > Large record size ..........   3267 bytes
> > > Number of large records ....   180
> > > Group size .................   4096 bytes
> > > Load factors ...............   90% (split), 50% (merge) and 72% (actual)
> > > Total size .................   519921664 bytes
> > > Total size of record data ..   287400591 bytes
> > > Total size of record IDs ...   21508497 bytes
> > > Unused space ...............   211004384 bytes
> > > Total space for records ....   519913472 bytes
> > >
> > > How does this look in terms of performance?
> > >
> > > My Actual load went down 8% as well as some overflow but it looks like my 
> > > load % is still high at 72% I'm wondering if I should raise the 
> > > MINIMUM.MODULUS even more
> > > since I still have a decent amount of overflow and not many large records.
> > >
> > > Chris
> > >
> > >
> > > > From: r...@lynden.com
> > > > To: u2-users@listserver.u2ug.org
> > > > Date: Tue, 3 Jul 2012 10:21:16 -0700
> > > > Subject: Re: [U2] RESIZE - dynamic files
> > > >
> > > > (record + id / 4096 or 2048)
> > > >
> > > > You need to factor in overhead & the split factor:   (records + ids) * 
> > > > 1.1 * 1.25  / 4096    (for 80%)
> > > >
> > > > If you use a 20% merge factor and a 80% split factor, the file will 
> > > > start merging unless you delete 60 percent of your records.  If you use 
> > > > 90% split factor, you will have more overflowed groups.  These numbers 
> > > > refer to the total amount of data in the file, not to any individual 
> > > > group.
> > > >
> > > > For records of the size that you have, I do not see any advantage to 
> > > > using a larger, 4096, group size. You will end up with twice the number 
> > > > of records per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed 
> > > > access.
> > > >
> > > > -Rick
> > > >
> > > > -----Original Message-----
> > > > From: u2-users-boun...@listserver.u2ug.org 
> > > > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> > > > Sent: Tuesday, July 03, 2012 9:48 AM
> > > > To: u2-users@listserver.u2ug.org
> > > > Subject: Re: [U2] RESIZE - dynamic files
> > > >
> > > >
> > > > File name ..................   GENACCTRN_POSTED
> > > > Pathname ...................   GENACCTRN_POSTED
> > > > File type ..................   DYNAMIC
> > > > File style and revision ....   32BIT Revision 12
> > > > Hashing Algorithm ..........   GENERAL
> > > > No. of groups (modulus) ....   92776 current ( minimum 31, 89 empty,
> > > >                                             28229 overflowed, 2485 
> > > > badly )
> > > > Number of records ..........   1290469
> > > > Large record size ..........   3267 bytes
> > > > Number of large records ....   180
> > > > Group size .................   4096 bytes
> > > > Load factors ...............   80% (split), 50% (merge) and 80% (actual)
> > > > Total size .................   500600832 bytes
> > > > Total size of record data ..   287035391 bytes
> > > > Total size of record IDs ...   21508449 bytes
> > > > Unused space ...............   192048800 bytes
> > > > Total space for records ....   500592640 bytes
> > > > ----
> > > > Using the record above, how would I calculate the following?
> > > >
> > > > 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to 
> > > > the current number)?
> > > > 2) SPLIT - would 90% seem about right?
> > > > 3) MERGE - would 20% seem about right?
> > > > 4) Large Record Size - does 3276 seem right?
> > > > 5) Group Size - should I be using 4096?
> > > >
> > > > I'm just a bit confused as to how to set these, I saw the formula to 
> > > > calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but 
> > > > I always get a lower number
> > > > than my current modulus..
> > > >
> > > > I also saw where it said to simply take your current modulus # and add 
> > > > 10-20% and set the MINIMUM.MODULUS based on that..
> > > >
> > > > Based on the table above I'm just trying to get an idea of what these 
> > > > should be set at.
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > > >
> > > > > From: cjausti...@hotmail.com
> > > > > To: u2-users@listserver.u2ug.org
> > > > > Date: Tue, 3 Jul 2012 10:28:17 -0500
> > > > > Subject: Re: [U2] RESIZE - dynamic files
> > > > >
> > > > >
> > > > > Doug,
> > > > >
> > > > > When I do the math I come up with a different # (see below):
> > > > >
> > > > > File name ..................   TEST_FILE
> > > > > Pathname ...................   TEST_FILE
> > > > > File type ..................   DYNAMIC
> > > > > File style and revision ....   32BIT Revision 12
> > > > > Hashing Algorithm ..........   GENERAL
> > > > > No. of groups (modulus) ....   82850 current ( minimum 24, 104 empty,
> > > > >                                             26225 overflowed, 1441 
> > > > > badly )
> > > > > Number of records ..........   1157122
> > > > > Large record size ..........   2036 bytes
> > > > > Number of large records ....   576
> > > > > Group size .................   4096 bytes
> > > > > Load factors ...............   80% (split), 50% (merge) and 80% 
> > > > > (actual)
> > > > > Total size .................   449605632 bytes
> > > > > Total size of record data ..   258687736 bytes
> > > > > Total size of record IDs ...   19283300 bytes
> > > > > Unused space ...............   171626404 bytes
> > > > > Total space for records ....   449597440 bytes
> > > > >
> > > > > ------
> > > > > 258,687,736 bytes - Total size of record data
> > > > > 19,283,300 bytes - Total size of record IDs
> > > > > ===========
> > > > > 277,971,036 bytes (record + id's)
> > > > >
> > > > > 277,971,036 / 4,084 = 68,063 bytes (minimum modulus)
> > > > > ------
> > > > >
> > > > > 68,063 is less than the current modulus of 82,850. Something with 
> > > > > this formula doesn't seem right because if I use that formula I 
> > > > > always calculate a
> > > > > minimum modulus of less than the current modulus.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Chris
> > > > >
> > > > >
> > > > >
> > > > > > Date: Mon, 2 Jul 2012 16:08:16 -0600
> > > > > > From: dave...@gmail.com
> > > > > > To: u2-users@listserver.u2ug.org
> > > > > > Subject: Re: [U2] RESIZE - dynamic files
> > > > > >
> > > > > > Hi Chris:
> > > > > >
> > > > > > You cannot get away with not resizing dynamic files in my 
> > > > > > experience.  The
> > > > > > files do not split and merge like we are led to believe.  The 
> > > > > > separator is
> > > > > > not used on dynamic files.  Your Universe file is badly sized.  The 
> > > > > > math
> > > > > > below will get you reasonably file size.
> > > > > >
> > > > > > Let's do the math:
> > > > > >
> > > > > > 258687736 (Record Size)
> > > > > > 192283300 (Key Size)
> > > > > > ========
> > > > > > 450,971,036 (Data and Key Size)
> > > > > >
> > > > > > 4096 (Group Size)
> > > > > > - 12   (32 Bit Overhead)
> > > > > > ====
> > > > > > 4084 Usable Space
> > > > > >
> > > > > > 450971036/4084 = Minimum Modulo 110424 (Prime is 110431)
> > > > > >
> > > > > >
> > > > > > [ad]
> > > > > > I hate doing this math all of the time.  I have a reasonably priced 
> > > > > > resize
> > > > > > program called XLr8Resizer for $99.00 to do this for me.
> > > > > > [/ad]
> > > > > >
> > > > > > Regards,
> > > > > > Doug
> > > > > > www.u2logic.com/tools.html
> > > > > > "XLr8Resizer for the rest of us"
> > > > > > _______________________________________________
> > > > > > U2-Users mailing list
> > > > > > U2-Users@listserver.u2ug.org
> > > > > > http://listserver.u2ug.org/mailman/listinfo/u2-users
> > > > >
> > > > > _______________________________________________
> > > > > U2-Users mailing list
> > > > > U2-Users@listserver.u2ug.org
> > > > > http://listserver.u2ug.org/mailman/listinfo/u2-users
> > > >
> > > > _______________________________________________
> > > > U2-Users mailing list
> > > > U2-Users@listserver.u2ug.org
> > > > http://listserver.u2ug.org/mailman/listinfo/u2-users
> > > > _______________________________________________
> > > > U2-Users mailing list
> > > > U2-Users@listserver.u2ug.org
> > > > http://listserver.u2ug.org/mailman/listinfo/u2-users
> > >
> > > _______________________________________________
> > > U2-Users mailing list
> > > U2-Users@listserver.u2ug.org
> > > http://listserver.u2ug.org/mailman/listinfo/u2-users
> > > _______________________________________________
> > > U2-Users mailing list
> > > U2-Users@listserver.u2ug.org
> > > http://listserver.u2ug.org/mailman/listinfo/u2-users
> >
> > _______________________________________________
> > U2-Users mailing list
> > U2-Users@listserver.u2ug.org
> > http://listserver.u2ug.org/mailman/listinfo/u2-users
>
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users

_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Reply via email to