We too do not use memresize to rebuild dynamic files. We have a process
that select the original file, breaks the subsequent list up into parts
and have phantom simultaneously copy records into the new file based on
the portion of the original list that that an individual phantom is
given. We have found this to almost as quick as the memresize verb. And
we don't get all the extra overxxx parts of the file that memresize
creates.

I am used to looking at the 'guide' results to figure what block size to
use. At the bottom of the guide, you get the ids' of some of the largest
records and their size. You also get a view of what percentage of the
file has records <=512 and <= 1024 and <= 2048 and so on. Your standard
deviation is high. So I have a feeling that a block size of 8 or 16
might work better with your file. Here is a sample of the 'guide' output
that I think is helpful in deciding block size. This file has the same
problem you are running into. Huge records next to tiny records. This is
the downside of multi-values over a period on years. And there is not
much you can do. I just recently had to make this file dynamic. Below is
how it looks after being made dynamic


  Basic statistics:
    File type............................... Static Hashing
    File size............................... 1909768192
    File modulo............................. 89399
    File hash type.......................... 0
    File block size......................... 16384
  Group count:
    Number of level 1 overflow groups....... 27163
    Primary groups in level 1 overflow...... 10555
  Record count:
    Total number of records................. 288052
    Average number of records per group..... 3.22
    Standard deviation from average......... 1.79
    Minimum number of records per group..... 0
    Maximum number of records per group..... 13
    Median number of records per group...... 6.50
  Record length:
    Average record length................... 2647.85
    Standard deviation from average......... 2795.27
    Minimum record length................... 99
    Maximum record length................... 267363
    Median record length.................... 133731.00
  Key length:
    Average key length...................... 14.08
    Standard deviation from average......... 0.29
    Minimum key length...................... 14
    Maximum key length...................... 16
    Median key length....................... 15.00
  Data size:
    Average data size....................... 2671.93
    Standard deviation from average......... 2768.48
    Total data size......................... 769654711
    Minimum data size....................... 123
    Maximum data size....................... 267387
    Median data size........................ 133755.00
    Data in 1 - 512 bytes range............. 73149      (25.39%)
    Data in 513 - 1024 bytes range.......... 96984      (33.67%)
    Data in 1025 - 2048 bytes range......... 36724      (12.75%)
    Data in 2049 - 3072 bytes range......... 30213      (10.49%)
    Data in 3073 - 4096 bytes range......... 16530      (5.74%)
    Data in 4097 - 5120 bytes range......... 9881       (3.43%)
    Data in 5121 - 6144 bytes range......... 5760       (2.00%)
    Data in 6145 - 7168 bytes range......... 3848       (1.34%)
    Data in 7169 - 8192 bytes range......... 1940       (0.67%)
    Data in 8193 - 9216 bytes range......... 1492       (0.52%)
    Data in 9217 - 10240 bytes range........ 1234       (0.43%)
    Data in 10241 - 11264 bytes range....... 818        (0.28%)
    Data in 11265 - 12288 bytes range....... 809        (0.28%)
    Data in 12289 - 13312 bytes range....... 567        (0.20%)
    Data in 13313 - 14336 bytes range....... 501        (0.17%)
    Data in 14337 - 15360 bytes range....... 412        (0.14%)
    Data in 15361 - 16384 bytes range....... 320        (0.11%)
    Data greater than 16384 bytes........... 6870       (2.38%)
  Largest data size:
    267387 bytes of data for this key....... "02325Z5049*1:3"  
    266943 bytes of data for this key....... "02262Z6138*1:3"  
    214466 bytes of data for this key....... "0315320746*4:4"  
  Smallest data size:
    123 bytes of data for this key.......... "08339Z6287*1:2"  
    131 bytes of data for this key.......... "0730910506A*1:3"  
    131 bytes of data for this key.......... "07341Z8969A*1:1"  
  Predicted optimal size:
    Records per block....................... 10
    Percentage of near term growth.......... 10
    Scalar applied to calculation........... 0.80
    Block size.............................. 16384
    Modulo.................................. 85597


total 5571664
-rw-rw-r--   1 root     mcc      1999994880 May 28 15:35 dat001
-rw-rw-r--   1 epops    mcc      288489472 May 28 14:40 dat002
-rw-rw-rw-   1 root     mcc      49135616 May 28 15:35 idx001
-rw-rw-r--   1 root     mcc      515047424 May 28 14:40 over001
-----Original Message-----
From: owner-u2-us...@listserver.u2ug.org
[mailto:owner-u2-us...@listserver.u2ug.org] On Behalf Of Andrew Nicholls
Sent: Thursday, May 28, 2009 5:17 PM
To: u2-users@listserver.u2ug.org
Subject: [U2] Resize of large dynamic file

Hi All

I am trying to resize a large dynamic Unidata file for a customer but am
struggling to determine the best modulo/seperation figures to use.  I
was under the impression that I should try and minimise the amount of
overflow files but my latest attempt just seems to have made the file
worse.  The records are historical going back a number of years and vary
widely in size.

Because we have had issues in the past with trying to resize large files
our standard practice now is to create a new file including indexes with
the new modulo/separation and copy all the records into it.

The original file had 26,521,431 records and was 62 GB in size with this
file structure 25/2/33 (dat/idx/overflow).  The new file was created
with a modulo/separator of 1300021/4 and having copied roughly 1/2 the
data I now have a file with 15,442,816 records, 28 GB and 5/1/22
(dat/idx/overflow).

The formula I used to get the new figures was

Records per Block = (file block size - pointer array) / (Average record
length + Standard deviation from average + Average key length + 9)

modulo = Total number of records / records per block

I have paused the copy for now because I would like to know whether I
should start again or just continue.  Any thoughts/assistance would be
appreciated.  The GUIDE.STATS.LIS from the new file is below

Regards
Andrew

  Basic statistics:
    File type............................... Dynamic Hashing
    File size
      [dat001].............................. 1073737728
      ...
      [dat005].............................. 1029955584
      [over001]............................. 1073737728
      ..
      [over022]............................. 861589504
    File modulo............................. 1300021
    File minimum modulo..................... 1300021
    File split factor....................... 60
    File merge factor....................... 40
    File hash type.......................... 0
    File block size......................... 4096
    Free blocks in overflow file(s)......... 13
  Group count:
    Number of level 1 overflow groups....... 5715317
    Primary groups in level 1 overflow...... 1298667
  Record count:
    Total number of records................. 15442816
    Average number of records per group..... 11.88
    Standard deviation from average......... 3.55
  Record length:
    Average record length................... 55.46
    Standard deviation from average......... 1511.47
  Key length:
    Average key length...................... 13.88
    Standard deviation from average......... 0.76
  Data size:
    Average data size....................... 79.34
    Standard deviation from average......... 1533.38
    Total data size......................... 1225203127
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to