On Dec 17, 12:01 pm, Ivan Krasilnikov <[email protected]> wrote:
> Hello,
>
> Recently I was using vim to edit large, multi-gigabyte text files and
> I noticed that writing a file back to disk takes significantly longer
> time than reading it.
>
> Here's the amount of time it took vim (v7.3.081) on my machine (24G RAM)
> to open a file with N 20-byte long lines and execute :wq command with
> default options and disabled swap files via -n flag. Also shown is
> the number of entries inserted into mf_hash table (explained below).
>
>         N             real     user    sys  items in mf_hash at peak
>   1000000 (19Mb)      0.58     0.33   0.16  5967
>   2000000 (38Mb)      1.02     0.57   0.19  11931
>   4000000 (76Mb)      2.33     1.48   0.39  23860
>   8000000 (153Mb)     5.48     3.31   1.23  47716
>  16000000 (305Mb)    13.79     9.77   1.40  95429
>  32000000 (610Mb)    66.10    59.55   2.64  190855
>  64000000 (1.2Gb)   329.30   311.52   7.45  381707
> 128000000 (2.4Gb)  1366.13  1224.87  17.96  763410
>
> Just reading a large file without writing it back is much faster:
>
>         N   real   user   sys
>   1000000   0.23   0.17  0.05
>   2000000   0.44   0.31  0.12
>   4000000   0.88   0.70  0.17
>   8000000   1.72   1.35  0.36
>  16000000   4.13   2.69  1.41
>  32000000   7.22   5.58  1.60
>  64000000  14.62  11.11  3.44
> 128000000  27.49  22.13  5.24
>
> Profiler shows that much of the time during write is spent in
> mf_find_hash(),
> which implements a fixed 64-bucket chained hash table mf_hash, and which is
> called once for every line written. This number of buckets is inadequate for
> the number of items inserted into this hash table for large files.
>
> Attached is a patch which makes mf_hash a dynamically growing hashtable
> with a fixed maximum load factor. Here's the runtime of my write test
> case with this patch:
>
>              max load factor = 1   max load factor = 64
>         N    real   user    sys     real   user    sys
>   1000000    0.46   0.25   0.11     0.52   0.34   0.08
>   2000000    0.98   0.50   0.24     0.93   0.56   0.18
>   4000000    1.98   1.06   0.39     1.81   1.08   0.42
>   8000000    4.08   2.17   0.71     3.97   2.25   0.81
>  16000000    8.24   4.29   1.47     7.83   4.74   1.53
>  32000000   16.85   8.60   3.05    16.78   9.45   3.26
>  64000000   32.62  17.16   5.55    32.98  19.08   5.94
> 128000000   66.71  34.38  11.27    66.51  38.09  12.40
>
> Other values of load factor up to ~ 256 seem also good to me.
>
> --
> Ivan Krasilnikov
>
>  mf_hash.patch
> 17KViewDownload

Why not use hashtable in hashtab.c for data storage?

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Raspunde prin e-mail lui