>>The next factor is the internal write multiplication factor. Lets >>say you have a device which is divided into 2 MB blocks. And you update 1 >>sector (512 bytes) somewhere in this block. The device must (a) read out >>the entire 2MB block (b) update the data within the block then (c) re- >>write a new 2 MB block to replace the old.
>>That I don't get. Are you sure about that? My understanding from what >>I've been reading was that the technology behind SSDs would force you to >>*erase* 2 MB blocks but also allow you to *write* e.g. 4KB pages. This is true, more or less. As you probably know NV storage (no matter what type, be it flash, nvram, prom, eprom, eeprom, whatever) all basically works the same. You can "erase" it which results in all the bits being the same off state -- lets call it a 0 (though it is possible for the logic states to be reversed where "off" or clear is 1 and on or programmed is 0). You can then "program" it by switching the state of some of the bits from "clear" to "programmed" (0 -> 1). You cannot, however, ever return a "programmed" (1) bit back to "cleared" (0) state, except by erasing the whole block. Depending on the particular device the "erase" size may be the same as the "program" size or it may be bigger up to the entire device -- UV eraseable PROM is an example of this where you can only "erase" the entire device as a whole, there are others. So yes, there are in fact TWO block sizes, the ERASE block size and the PROGRAM block size. The ERASE size is often bigger than the PROGRAM size. A PROGRAM operation programs an entire PROGRAM block, and an ERASE erases and entire erase block which encompases multiple program blocks. For most SSD/Flash storage devices the size of the PROGRAM block is the minimum I/O block size, and is usually 4 KB or so. The ERASE size may be much bigger (and it usually is) at say 2 MB. For simple management processing there is a total number of "program blocks" on the device, addressed by their "physical block number". Each physical block resides within a "erase block" which is usually larger than the "program block" size and the "erase block" number can be derived from the "physical block number" (usually by a simple binary shift). For storage management at the most basic level, the hardware storage controller maintains mapping between the "Logical Block Number" and "Physical Block Number", and a "Physical Block Allocation Table" containing information about the usage of physical program blocks. There will also be a list of free physical blocks, and table of some statistics about various block erase/program operations. At least the mapping (logical->physical) and BAT must be persistent. The statistics are usually also persistent. The lists and any other tables are only needed during "operation" and are usually rebuilt entirely in device RAM when the device is powered on, their contents being derived from the persistent data. All access is by "Logical Block Number" (which may reside at any "physical block number"). There are basically three operations that take place on Logical Blocks: Read, Write, Delete Read simply translated the logical->physical, reads the physical block, and returns it to the "requestor". Write will mark the current physical block that holds the logical block as "deleted", find a free physical block to write the data to (and write it), then update the logical->physical mapping table to map the logical block to its physical location. Delete is the same as write except that there is no writing of a physical data block, and the logical block is marked as "unallocated" in the logical->physical mapping table, and the actual physical data block is marked as "deleted" in the BAT. This process depends on there being a "pool" of "ready to program" physical storage blocks. This is managed by a separate process running at the hardware level. If the free pool is depleted then the equivalent of an interrupt to the pool management process must be generated to get the pool manager to put some blocks on the free list and the process of writing has to wait until there is a block in the free pool which can be used. Sometimes the BAT updates will generate an interrupt to the block management process (for example, all physical blocks in an erase block are now "deleted" so the entire erase block can be erased and all the physical blocks it contains put on the free block list). The high level TRIM operation is really nothing more than "delete" against a logical block. >In other words, I was expecting the SSD controller and/or the >filesystem to be smart enough to cleverly allocate and move pages around >within the >available blocks. >So if a 2MB-block is made of 512 4KB-pages, just overwriting the same >4KB page 512 times will only cause one block erasure (or something in >that order of magnitude), not 512. If that is correct, my conclusion >would be >that you should always write in multiples of the page size (e.g. >4KB), assuming you somehow get to know that value. >Perhaps you're actually saying the same thing in the following >paragraph? More or less. Basically the I/O size presented by the OS driver is usually equal to the program block size (but does not have to be). If it is not, then "data editing" is carried out on the device in RAM the same as it is for spinning disks (retrieve the block, edit the data, write the new block, done at the hardware level). The efficiency of writing to SSDs and minimizing "erase" operations is dependent on having a pool of blocks available on the free list. There is no need to ever "erase" unless this pool is depleted however, the background manager does this anyway to manage the layout of blocks, coalesce free blocks, and try to optimize the ordering of logical blocks (if you can optimize the layout of the logical blocks in physical blocks you can optimize access speed -- especially since it takes time to "open" a "line" (which is again usually somewhere between an erase and a program block size) for access. So yes, updating/deleting a logical block will eventually result in an erase operation but the urgency (now, in a minute, next week, etc) depends on the size of the free list. So really the secret is to have lots of free blocks available at all times and keep the thermal limits in mind. Once you get close to the edge (either in not having blocks available or pushing the thermal envelope) performance will suffer and the device will degrade faster. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users