On Dec 17, 12:01 pm, Ivan Krasilnikov <[email protected]> wrote: > Hello, > > Recently I was using vim to edit large, multi-gigabyte text files and > I noticed that writing a file back to disk takes significantly longer > time than reading it. > > Here's the amount of time it took vim (v7.3.081) on my machine (24G RAM) > to open a file with N 20-byte long lines and execute :wq command with > default options and disabled swap files via -n flag. Also shown is > the number of entries inserted into mf_hash table (explained below). > > N real user sys items in mf_hash at peak > 1000000 (19Mb) 0.58 0.33 0.16 5967 > 2000000 (38Mb) 1.02 0.57 0.19 11931 > 4000000 (76Mb) 2.33 1.48 0.39 23860 > 8000000 (153Mb) 5.48 3.31 1.23 47716 > 16000000 (305Mb) 13.79 9.77 1.40 95429 > 32000000 (610Mb) 66.10 59.55 2.64 190855 > 64000000 (1.2Gb) 329.30 311.52 7.45 381707 > 128000000 (2.4Gb) 1366.13 1224.87 17.96 763410 > > Just reading a large file without writing it back is much faster: > > N real user sys > 1000000 0.23 0.17 0.05 > 2000000 0.44 0.31 0.12 > 4000000 0.88 0.70 0.17 > 8000000 1.72 1.35 0.36 > 16000000 4.13 2.69 1.41 > 32000000 7.22 5.58 1.60 > 64000000 14.62 11.11 3.44 > 128000000 27.49 22.13 5.24 > > Profiler shows that much of the time during write is spent in > mf_find_hash(), > which implements a fixed 64-bucket chained hash table mf_hash, and which is > called once for every line written. This number of buckets is inadequate for > the number of items inserted into this hash table for large files. > > Attached is a patch which makes mf_hash a dynamically growing hashtable > with a fixed maximum load factor. Here's the runtime of my write test > case with this patch: > > max load factor = 1 max load factor = 64 > N real user sys real user sys > 1000000 0.46 0.25 0.11 0.52 0.34 0.08 > 2000000 0.98 0.50 0.24 0.93 0.56 0.18 > 4000000 1.98 1.06 0.39 1.81 1.08 0.42 > 8000000 4.08 2.17 0.71 3.97 2.25 0.81 > 16000000 8.24 4.29 1.47 7.83 4.74 1.53 > 32000000 16.85 8.60 3.05 16.78 9.45 3.26 > 64000000 32.62 17.16 5.55 32.98 19.08 5.94 > 128000000 66.71 34.38 11.27 66.51 38.09 12.40 > > Other values of load factor up to ~ 256 seem also good to me. > > -- > Ivan Krasilnikov > > mf_hash.patch > 17KViewDownload
Why not use hashtable in hashtab.c for data storage? -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php
