Thank you Keith for your elaborate explanation about the inner workings of 
cache and the options that are involved. I has made me realize again not to 
draw conclusions too quickly when one is not in depth familiar with the 
background of a problem.

Regards,
Rob


-----Original Message-----
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Keith Medcalf
Sent: donderdag 26 maart 2015 2:29
To: General Discussion of SQLite Database
Subject: Re: [sqlite] FW: Very poor SQLite performance when using Win8.1 + 
Intel RAID1

On Wednesday, 25 March, 2015 07:47, Rob van der Stel <RvanderStel at 
benelux.tokheim.com> said:

>The naming in WinXP and Win81 is such that a mistake is easily made, the
>corresponding options are namely:
>
>a1)     WinXP:  'Optimize for for performance' -- this corresponds to
>a2)     Win81:  'Enable write cache on the device'
>
>b1)     WinXP:  'Enable write cache on the disk' -- this corresponds to
>b2)     Win81:  'Turn off Windows write-cache buffer flushing on the
>device'
>
>I wrongfully thought that b1 and a2 represented the same option. In my
>System 1 WinXP both options were 'checked' (a1 & b1).
>In my System 2 Win81 only option 'a2' was checked, assuming that it
>resembled 'b1' and that option 'b2' was a new feature in Win81.

Actually, a1 does not correspond with a2.  In Windows XP, A1 (Optimize for 
performance) enables the Windows Volume Write Cache in system RAM.  This is 
because Windows XP is seeing your SSD as a "Flash" drive (ie, as if it were 
nothing more than a removeable flash drive of the same type as you would plug 
into a USB port and can yank out at any old time).  By default, such devices 
have all write caching disabled in the OS so that the luser can "yank" the 
device out of the computer at any time without destroying the filesystem.  If 
you are not the sort of person who engages in this behaviour and uses "safe 
remove" for such devices then this option will enable the OS Cache and enable 
putting not-so-primitive filesystems (such as NTFS instead of FAT) on the 
device.  Advanced filesystem performance is absolutely dreadful without OS 
Write Caching, Scatter Gather, and Write Coalescing because the amount of I/O 
without a cache is so huge.

A1:  'Optimize for for performance' means to turn a yankable device into a 
user-won't-yank-without-notice device and enables the OS cache.
A2 and B1 are the same:  They enable the device level cache on the device to be 
write-back (checked) or write-through (unchecked).
B2:  Is always checked in XP (you cannot enable forced device buffer flush).

More recent versions of Windows recognize that an SSD is not a user yankable 
device and therefore does not disable the OS (System) cache to protect the 
luser from hurting themselves.  They see the SSD as a traditional fixed disk 
device (which it really is).  If XP saw the SSD as a fixed disk rather than 
luser-yankable, it would have enabled the OS Cache by default and the option 
you would have seen would have been whether or not to enable the device level 
write cache (ie, write-back rather than write-through).  Thus ended the options 
available under XP.  With OS Caching enabled on XP/Vista/2003 (or earlier) 
there was no way for an application to guarantee that a write *EVER* made it to 
permanent storage.  In fact, there is a special "oooopsie" message for this 
"Cache Data Lost before it could be written to disk" that is quite common on 
any Windows client/server OS system (XP/Vista/2003 and before) that has an even 
moderate I/O load.

On later versions of Windows (in this case Win 8.1, though it applies to 
anything based on the 2008 or later kernel base, which is basically anything 
later than 2003 Server/Windows Vista) the cache management is improved.  
Primarily this was done to support the "transactional filesystem".  This added 
the "Turn off Windows Write-Cache buffer flushing" option to Fixed Disks 
(actually, the default was always there and it was always checked -- you just 
now have the option), which is only effective if "Enable Write Cache" (ie, 
write-back cache at the device level) is turned on.

Now we get into the two different kinds of cache that can be available -- block 
cache, and filesystem cache.  A block cache is how one increases I/O 
performance to a block device and is based on common physics and the 
construction of spinning disk media:  You can read an entire track of data into 
a buffer in the same time as you can read a sector span of less than a full 
track, and that you can write an entire track from a buffer to the media in 
less time than you can write a sector span of less than a full track (you can 
prove this mathematically based of rotational speed, settling time, and using 
the average third of a rotation delay).  Therefore, all I/O to the physical 
media should always be done by full track read or written as a single stroke 
from wherever the head lands after settling.  (The same theory applies to 
access to RAM and MLC Flash -- it is faster to read or write an entire "line" 
than it is to read/write a single byte -- though in the case of modern RAM/MLC!
  Flash, i
 t is practically impossible to access a subset of a "line" without a block 
cache).  In the case of RAM and MLC Flash, the controller logic and L1/L2/L3 
cache enforces this.

So, the cache located "on the drive" in modern hard disks (and other things 
simillar) is a block cache.  It works in entire "blocks" up-down from the 
actual storage media to the cache on the device.  So if the OS sends a command 
to "read sector 485938" the drive might actually read sectors 485700 through 
485799 into the block cache, then return just the wee bit requested to the OS.  
Similarly if the OS asks to "Write Sector 485938" and the I/O block from 
sectors 485700 through 485799 is not in the device cache, then the device must 
read the larger "line" block first, edit in the data from the OS, and then 
re-write the entire block back to the media.  This is why bitty-box Operating 
Systems have such pitiful I/O characteristics.

Bitty-box OS's generally instead have what is called a FileSystem cache rather 
than a block cache in the hopes this will compensate for the inability to 
comprehend physics (and the difficulty of implementing efficient I/O in 
non-coprocessed resource scarce environments).  This means that the cache 
contains chained file data and not the I/O blocks, and why all I/O from windows 
(and other bitty-box OS's) is devolved into scatter/gather of sectors (or 
clusters) rather than whole blocks, and why I/O performance generally sucks.

Because of these issues it became necessary to have the ability in bitty-box 
filesystem caches to "flush" a file (since they were inherently very 
inefficient) -- a concept which does not exist and is not needed for High 
Performance I/O systems which operate with "I/O Blocks".  This forces the OS to 
write-down the data from the filesystem cache to the device, but the device 
must still read/write/edit by line, so waiting for this process to occur is 
slow.  When the Windows "Turn Off Windows Write-Cache buffer flushing" is 
turned ON (ier, unchecked), the OS requires that the writes from the cache to 
the device are "written to the media" before being acknowledged by the device 
(and thus marked clean in the cache, preventing the aforementioned "ooopsie" 
errors.  When enabled it means that there may be data sitting in the RAM 
buffers on the device that has not yet been written to the actual persistent 
storage yet, but will be as fast as the drive can manage (and will be marked as 
!
 clean in
 the filesystem cache).  Thus there is a small window where a power failure may 
cause loss of data.  Devices intended for higher level of performance will have 
the write-queue either independantly battery backed (if in static RAM for 
example) or will have NVRAM and capacitors sufficient to "flush" the RAM 
write-queue into non-volatile storage on a power failure.

There are companies which make proper "block level" caches that run on Windows 
that sit between the OS FileSystem cache and the device who's sole purpose is 
to collect up the bitty writes from the OS and send them to the actual storage 
device efficiently.  Of course, because these use system RAM they can also 
cause data loss, however, they can provide huge I/O performace increases even 
when not operating in "write-back".  Examples range from products like 
SuperCache, to how the I/O subsystem on hypervisors such as ESX work to do 
large I/O block operations to the media while dealing with bitty-writes from 
the VM's (hypervisors can emulate high performance co-processed I/O systems 
because they are able to dedicate storage and CPU to proper I/O management 
independantly of the workload/VMs)

So, the answer to you question really is that it depends on your device.  If 
you check the "Turn off Windows write-cache buffer flushing on the device", and 
the device does not have the capability to preserve the pending chain of 
write-edits in Non-volatile storage if the power fails, then in the event of a 
power failure you will lose the data that is in the edit-chain waiting to be 
written.  On the other hand, for SSDs it would be possible to provide 
sufficient power to enable the cache to be written out simply by using on-board 
capacitors since the time to flush the write-chain properly is very small 
compared to a spinning disk.

---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.



_______________________________________________
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

This e-mail and any attachments contain material that is confidential for the 
sole use of the intended recipient.  Any review, reliance or distribution by 
others or forwarding without express permission is strictly prohibited.  If you 
are not the intended recipient, please contact the sender and delete all copies.


This message has been scanned for malware by Websense. www.websense.com

Reply via email to