Re: [U2] RESIZE - dynamic files

Rick Nuckolls Thu, 05 Jul 2012 09:22:22 -0700

Chis,

I still am wondering what is prompting you to continue using the larger group 
size.


I think that Martin, and the UV documentation is correct in this case; you 
would be as well or better off with the defaults.

-Rick

On Jul 5, 2012, at 9:13 AM, "Martin Phillips" <martinphill...@ladybridge.com> 
wrote:
coming
> Hi,
> 
> The various suggestions about setting the minimum modulus to reduce overflow 
> are all very well but effectively you are turning a
> dynamic file into a static one, complete with all the continual maintenance 
> work needed to keep the parameters in step with the
> data.
> 
> In most cases, the only parameter that is worth tuning is the group size to 
> try to pack things nicely. Even this is often fine left
> alone though getting it to match the underlying o/s page size is helpful.
> 
> I missed the start of this thread but, unless you have a performance problem 
> or are seriously short of space, my recommendation
> would be to leave the dynamic files to look after themselves.
> 
> A file without overflow is not necessarily the best solution. Winding the 
> split load down to 70% means that at least 30% of the file
> is dead space. The implication of this is that the file is larger and will 
> take more disk reads to process sequentially from one end
> to the other.
> 
> 
> Martin Phillips
> Ladybridge Systems Ltd
> 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
> +44 (0)1604-709200
> 
> 
> 
> -----Original Message-----
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: 05 July 2012 15:19
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> I was able to drop from 30% overflow to 12% by making 2 changes:
> 
> 1) changed the split from 80% to 70% (that alone reduce 10% overflow)
> 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
> data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
> 
> My disk size only went up 8%..
> 
> My file looks like this now:
> 
> File name ..................   GENACCTRN_POSTED
> Pathname ...................   GENACCTRN_POSTED
> File type ..................   DYNAMIC
> File style and revision ....   32BIT Revision 12
> Hashing Algorithm ..........   GENERAL
> No. of groups (modulus) ....   118681 current ( minimum 118681, 140 empty,
>                                            14431 overflowed, 778 badly )
> Number of records ..........   1292377
> Large record size ..........   3267 bytes
> Number of large records ....   180
> Group size .................   4096 bytes
> Load factors ...............   70% (split), 50% (merge) and 63% (actual)
> Total size .................   546869248 bytes
> Total size of record data ..   287789178 bytes
> Total size of record IDs ...   21539538 bytes
> Unused space ...............   237532340 bytes
> Total space for records ....   546861056 bytes
> 
> Chris
> 
> 
> 
>> From: keith.john...@datacom.co.nz
>> To: u2-users@listserver.u2ug.org
>> Date: Wed, 4 Jul 2012 14:05:02 +1200
>> Subject: Re: [U2] RESIZE - dynamic files
>> 
>> Doug may have had a key bounce in his input
>> 
>>> Let's do the math:
>>> 
>>> 258687736 (Record Size)
>>> 192283300 (Key Size)
>>> ========
>> 
>> The key size is actually 19283300 in Chris' figures
>> 
>> Regarding 68,063 being less than the current modulus of 82,850.  I think the 
>> answer may lie in the splitting process.
>> 
>> As I understand it, the first time a split occurs group 1 is split and its 
>> contents are split between new group 1 and new group 2.
> All the other groups effectively get 1 added to their number. The next split 
> is group 3 (which was 2) into 3 and 4 and so forth. A
> pointer is kept to say where the next split will take place and also to help 
> sort out how to adjust the algorithm to identify which
> group matches a given key.
>> 
>> Based on this, if you started with 1000 groups, by the time you have split 
>> the 500th time you will have 1500 groups.  The first
> 1000 will be relatively empty, the last 500 will probably be overflowed, but 
> not terribly badly.  By the time you get to the 1000th
> split, you will have 2000 groups and they will, one hopes, be quite 
> reasonably spread with very little overflow.
>> 
>> So I expect the average access times would drift up and down in a cycle.  
>> The cycle time would get longer as the file gets bigger
> but the worst time would be roughly the the same each cycle.
>> 
>> Given the power of two introduced into the algorithm by the before/after the 
>> split thing, I wonder if there is such a need to
> start off with a prime?
>> 
>> Regards, Keith
>> 
>> PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
>> 
>> _______________________________________________
>> U2-Users mailing list
>> U2-Users@listserver.u2ug.org
>> http://listserver.u2ug.org/mailman/listinfo/u2-users
>                         
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
> 
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] RESIZE - dynamic files

Reply via email to