Aha!  I see where you're going.  I'll give that a go!

Thanks to all for your input!

Perry 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Anthony
Youngman
Sent: Monday, December 03, 2007 8:21 AM
To: 'u2-users@listserver.u2ug.org'
Subject: RE: [U2][UV] Dynamic File MINIMUM.MODULUS Calculation

A second, far bigger adjustment (which will account for a lot of the
underestimate), is to divide by 1600, not 2000 or 2048.

A split factor of 80 means that your groups will, at maximum, be 80%
full. 80% of 2048 is 1640 (near enough).

That will give a second-cut (and rather more accurate) estimate 25%
larger than the original "divide by 2048" estimate. And 20-25%
(depending which way you're going) is quite a big error.

Cheers,
Wol

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mats Carlid
Sent: 03 December 2007 10:08
To: u2-users@listserver.u2ug.org
Subject: Re: [U2][UV] Dynamic File MINIMUM.MODULUS Calculation

A first adjustment is to allow for the overhead that
page and record headers and other administrative information uses

e.g.  divide by 2000 instead of 2048.

 Each page has d page header, each record as a record header
there will be an 'end of item'-character at the end of each key and each
record
( if not there will be pointers and/or lengths instead )
and there will be some unused characters where no record fits at the end
of  the page.
I don't know the sizes of these things but I'd feel lucky  if I had a
file
where they totalled less than 48bytes per page.

-- mats





Perry Taylor skrev:
> That's approximately what I did when calculating how big I needed this
> file, although I used the average record size rather than the total
> bytes since the majority of the records are less than 75 bytes with a
> few large records.
>
> Let's assume for argument's sake that I took the sum of the record and
> ID sizes and divided by 2048.  The problem is I came up way short.  If
I
> query that data from the existing table here's what I get...
>
>
>> SUM CLM.RUN.STRIP.PROVDATA  EVAL "LEN(@ID)" AS "ID BYTES" FMT "10R"
>>
> EVAL "LEN(@RECORD)" AS "REC BYTES" FMT "10R" EVAL "LEN(@ID) +
> LEN(@RECORD)" AS "BYTES" FMT "10R"
>
> CLM.RUN.STRIP.PROVDATA ID BYTES.. REC BYTES. BYTES.....
>
>
>                        ========== ========== ==========
> TOTALS                    5035139   17035963   22071102
>
>
> 719305 records summed.
>
>
>> DIVD 22071102 2048
>>
> Decimal 10776 remainder 1854
>
> See this is where I get into trouble.  10776 is still a far cry from
> 19618.  What is also confusing is why ANALYZE.FILE reports such larger
> totals for the record and ID bytes than I get from my query.  See the
> confusion?
>
> Thanks.
>
> Perry
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Dan
Fitzgerald
> Sent: Thursday, November 29, 2007 12:19 PM
> To: u2-users@listserver.u2ug.org
> Subject: RE: [U2][UV] Dynamic File MINIMUM.MODULUS Calculation
>
> This is a very rough cut, but if you divide data+id bytes by 2048,
> you'll be
> in the ballpark.
>
> so...
>
> 46Mb + 5Mb = 51Mb/2048=24902
>
> I'd pad that a bit (you don't need tightly packed files), possibly as
> much as
> 50% & maybe go up to a MIN MOD of a prime near 36K. I'm assuming that
> disk
> space isn't a constraint as we're dealing with a few Mb.
>
> If you don't know what the data size will be, my admittedly subjective
> formula
> would be to divide the anticipated number of records by 10 & go with a
> number
> in that range, assuming no other issues (like the # of id's is going
to
> be 40
> Billion & it's a 32-bit file). In this case, you'd end up with an
> oversized
> file (MIN MOD around 72K) , but you can always adjust that if it's a
> problem.
> I don't mind oversizing nearly as much as undersizing.> Subject:
> [U2][UV]
> Dynamic File MINIMUM.MODULUS Calculation> Date: Thu, 29 Nov 2007
> 10:13:53
> -0500> From: [EMAIL PROTECTED]> To:
u2-users@listserver.u2ug.org>
>
>> I
>>
> continue to struggle with accurate calculation of MINIMUM.MODULUS>
> values for
> dynamic files. Static hashed files do not seem to be such a> challenge
> for me.
>
>>> Here is an example where, once again, I was woefully shy...> >
>>>
>
----------------------------------------------------------------------->
> File
> name .................. CLM.RUN.STRIP.PROVDATA> Pathname
> ...................
> CLM.RUN.STRIP.PROVDATA> File type .................. DYNAMIC> Hashing
> Algorithm .......... GENERAL> No. of groups (modulus) .... 19618
current
> (
> minimum 7867, 0 empty,> 8869 overflowed, 4612 badly> )> Number of
> records
> .......... 719305> Large record size .......... 75 bytes> Number of
> large
> records .... 6614> Group size ................. 2048 bytes> Load
factors
> ............... 80% (split), 50% (merge) and 80% (actual)> Total size
> ................. 69978112 bytes> Total size of record data ..
46331730
> bytes>
> Total size of record IDs ... 5114218 bytes> Unused space
...............
> 18528068 bytes> Total space for records .... 69974016 bytes> > File
name
> .................. CLM.RUN.STRIP.PROVDATA> Number per group ( total of
> 19618
> groups> )> Average Minimum Maximum StdDev> Group buffers
..............
> 1.74 1
> 15 1.19> Records .................... 36.67 10 68 12.15> Large records
> .............. 0.34 1 4 0.58> Data bytes ................. 2361.69 350
> 30359
> 2270.12> Record ID bytes ............ 260.69 70 491 86.71> Unused
bytes
> ............... 944.44 24 2068 650.69> Total bytes ................
> 3566.83
> 2048 30720 0.00> > > Number per record ( total of 719305> records )>
> Average
> Minimum Maximum StdDev> Data bytes ................. 64.41 34 28700
> 353.25>
> Record ID bytes ............ 7.11 2 20 1.14> Total bytes
> ................
> 71.52 36 28720 354.13>
>
----------------------------------------------------------------------->
>
>> I
>>
> cannot seem to figure out the relationship of the current modulus to>
> the
> record sizes/counts/group size factors. When I do the math I always>
> come up
> short.> > Can someone offer some suggestions for accurate calculation
> of>
> MINIMUM.MODULUS?> > Thanks.> > Perry Taylor> ZirMed, Inc.> >
> CONFIDENTIALITY
> NOTICE: This e-mail message, including any attachments, is for the
sole
> use of
> the intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution
is
> prohibited. ZirMed, Inc. has strict policies regarding the content of
> e-mail
> communications, specifically Protected Health Information, any
> communications
> containing such material will be returned to the originating party
with
> such
> advisement noted. If you are not the intended recipient, please
contact
> the
> sender by reply e-mail and destroy all copies of the original
message.>
> -------> u2-users mailing list> u2-users@listserver.u2ug.org> To
> unsubscribe
> please visit http://listserver.u2ug.org/
> _________________________________________________________________
> Connect and share in new ways with Windows Live.
>
http://www.windowslive.com/connect.html?ocid=TXT_TAGLM_Wave2_newways_112
> 007
> -------
> u2-users mailing list
> u2-users@listserver.u2ug.org
> To unsubscribe please visit http://listserver.u2ug.org/
> -------
> u2-users mailing list
> u2-users@listserver.u2ug.org
> To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to