Chris,

For the type of use that you described earlier; BASIC selects and reads, 
reducing overflow will have negligible performance benefit, especially compared 
to changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
relatively small percentages, then it will never merge anyway (because you will 
need to delete 20-30% of the file for that to happen with the mergeload at 50%, 
so your optimum minimum modulus solution will probably be "how ever large it 
grows"  The overhead for a group split is not as bad as it sounds unless your 
updates/sec count is extremely high, such as during a copy.

If you do regular SELECT and SCANS of the entire file, then your goal should be 
to reduce the total disk size of the file, and not worry much about common 
overflow. The important thing is that the file is dynamic, so you will never 
encounter the issues that undersized statically hashed files develop.

We have thousands of dynamically hashed files on our (Solaris) systems, with an 
extremely low problem rate.

Rick

-----Original Message-----
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Thursday, July 05, 2012 11:21 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Rick,

You are correct, I should be using the smaller size (I just haven't changed it 
yet). Based on the reading I have done you should
only use the larger group size when the average record size is greater than 
1000 bytes. 

As far as being better off with the defaults that's basically what I'm trying 
to test (as well as learn how linear hashing works). I was able
to reduce my overflow by 18% and I only increased my empty groups by a very 
small amount as well as only increased my file size
by 8%. This in theory should be better for reads/writes than what I had before. 

To test the performance I need to write a ton of records and then capture the 
output and compare the output using timestamps. 

Chris


> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Thu, 5 Jul 2012 09:22:02 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Chis,
> 
> I still am wondering what is prompting you to continue using the larger group 
> size.
> 
> I think that Martin, and the UV documentation is correct in this case; you 
> would be as well or better off with the defaults.
> 
> -Rick
> 
> On Jul 5, 2012, at 9:13 AM, "Martin Phillips" <martinphill...@ladybridge.com> 
> wrote:
> coming
> > Hi,
> > 
> > The various suggestions about setting the minimum modulus to reduce 
> > overflow are all very well but effectively you are turning a
> > dynamic file into a static one, complete with all the continual maintenance 
> > work needed to keep the parameters in step with the
> > data.
> > 
> > In most cases, the only parameter that is worth tuning is the group size to 
> > try to pack things nicely. Even this is often fine left
> > alone though getting it to match the underlying o/s page size is helpful.
> > 
> > I missed the start of this thread but, unless you have a performance 
> > problem or are seriously short of space, my recommendation
> > would be to leave the dynamic files to look after themselves.
> > 
> > A file without overflow is not necessarily the best solution. Winding the 
> > split load down to 70% means that at least 30% of the file
> > is dead space. The implication of this is that the file is larger and will 
> > take more disk reads to process sequentially from one end
> > to the other.
> > 
> > 
> > Martin Phillips
> > Ladybridge Systems Ltd
> > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
> > +44 (0)1604-709200
> > 
> > 
> > 
> > -----Original Message-----
> > From: u2-users-boun...@listserver.u2ug.org 
> > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> > Sent: 05 July 2012 15:19
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > I was able to drop from 30% overflow to 12% by making 2 changes:
> > 
> > 1) changed the split from 80% to 70% (that alone reduce 10% overflow)
> > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
> > data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
> > 
> > My disk size only went up 8%..
> > 
> > My file looks like this now:
> > 
> > File name ..................   GENACCTRN_POSTED
> > Pathname ...................   GENACCTRN_POSTED
> > File type ..................   DYNAMIC
> > File style and revision ....   32BIT Revision 12
> > Hashing Algorithm ..........   GENERAL
> > No. of groups (modulus) ....   118681 current ( minimum 118681, 140 empty,
> >                                            14431 overflowed, 778 badly )
> > Number of records ..........   1292377
> > Large record size ..........   3267 bytes
> > Number of large records ....   180
> > Group size .................   4096 bytes
> > Load factors ...............   70% (split), 50% (merge) and 63% (actual)
> > Total size .................   546869248 bytes
> > Total size of record data ..   287789178 bytes
> > Total size of record IDs ...   21539538 bytes
> > Unused space ...............   237532340 bytes
> > Total space for records ....   546861056 bytes
> > 
> > Chris
> > 
> > 
> > 
> >> From: keith.john...@datacom.co.nz
> >> To: u2-users@listserver.u2ug.org
> >> Date: Wed, 4 Jul 2012 14:05:02 +1200
> >> Subject: Re: [U2] RESIZE - dynamic files
> >> 
> >> Doug may have had a key bounce in his input
> >> 
> >>> Let's do the math:
> >>> 
> >>> 258687736 (Record Size)
> >>> 192283300 (Key Size)
> >>> ========
> >> 
> >> The key size is actually 19283300 in Chris' figures
> >> 
> >> Regarding 68,063 being less than the current modulus of 82,850.  I think 
> >> the answer may lie in the splitting process.
> >> 
> >> As I understand it, the first time a split occurs group 1 is split and its 
> >> contents are split between new group 1 and new group 2.
> > All the other groups effectively get 1 added to their number. The next 
> > split is group 3 (which was 2) into 3 and 4 and so forth. A
> > pointer is kept to say where the next split will take place and also to 
> > help sort out how to adjust the algorithm to identify which
> > group matches a given key.
> >> 
> >> Based on this, if you started with 1000 groups, by the time you have split 
> >> the 500th time you will have 1500 groups.  The first
> > 1000 will be relatively empty, the last 500 will probably be overflowed, 
> > but not terribly badly.  By the time you get to the 1000th
> > split, you will have 2000 groups and they will, one hopes, be quite 
> > reasonably spread with very little overflow.
> >> 
> >> So I expect the average access times would drift up and down in a cycle.  
> >> The cycle time would get longer as the file gets bigger
> > but the worst time would be roughly the the same each cycle.
> >> 
> >> Given the power of two introduced into the algorithm by the before/after 
> >> the split thing, I wonder if there is such a need to
> > start off with a prime?
> >> 
> >> Regards, Keith
> >> 
> >> PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
> >> 
> >> _______________________________________________
> >> U2-Users mailing list
> >> U2-Users@listserver.u2ug.org
> >> http://listserver.u2ug.org/mailman/listinfo/u2-users
> >                         
> > _______________________________________________
> > U2-Users mailing list
> > U2-Users@listserver.u2ug.org
> > http://listserver.u2ug.org/mailman/listinfo/u2-users
> > 
> > _______________________________________________
> > U2-Users mailing list
> > U2-Users@listserver.u2ug.org
> > http://listserver.u2ug.org/mailman/listinfo/u2-users
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
                                          
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Reply via email to