[StorageSystem] SSD

Write Amplification in SSD

We need to have a map betwwen LBA to PBA, logical block address to physical block address.

In DRAM, overwrite/ update would be relatively easy, find the address, and then replace the old value with the new value directly. But it is quite different in SSD, SSD would write to a new block at the end of storage space, and keep the old dirty page still there. (Not in-place physical, but in physical logical)

Within the increasing writing, finally the space is used up within quite a lot of dirty pages. In this situation, we need to clean pages. Assume we have 8 pages in one block, of which 5 pages are dirty and 3 pages are effective. Then we move remaining valid pages to the available page space, and then the block is empty, which could be erased further.

Again, there is no in place overwriting, each block need to be erased before being written again. It is quite a time consuming task for erasing, also it is important to choose blocks to minimize page traffic. Throughput decreased a lot when SSD begins to clean and erase, which is quite a huge performance gap.

Some optimization could be executed here. Over-provisioning provide less storage space than the disk actually has. For example, the disk has 120M pages, but only told OS that there are 100M pages. The 20M over-provisioning pages is used for cleaning. You can regard it as a kind of buffer remained for erase.

For another reason, whenever the workload has burstiness feature, it would be useful to execute the garbage collection in background.

TRIM is a command only used for SSD, which is a part of command set. TRIM provides the “delete” notification. In SSD, when user asks to delete the file, the file content is not actually deleted, only the mapping table from LBA to PBA has been modified. When the second user wish to get access to an deleted file, system could return a negative result, although the row data still exists.

LBA 100 -> PBA 10
---> TRIM Block 100
LBA 100 -> Non-Pointer

If the user always write to the same physical block, SSD would be quickly corrupted. You could think about load balancing, and we need load balancing between different blocks in the SSD, which is called wear leveling.

A simple allocation algorithm

Assume a block could be erased 100K times, and the migrate threshold is 80K, the throttle-threshold is 95K.
1. When EraseTime(A)<80k, clean A, the mapping relation (LBA->PBA) is not changed.
2. When EraseTime(A)>80k, clean A and move the cold data into A.
Acutally some less frequently used logical block(LBA) would be pointed from physical block B into physical block A. After that, the physical block A would be used less in the future, while block B would be used more in the future.
3. When EraseTime(A)>95k, reduce the probability of cleaning A
Use other physical blocks instead of continuing using A again until it has to be used. System could use the over provision blocks to replace the block A.



Leave a Reply

Your email address will not be published. Required fields are marked *