Flash Architecture

Flash drives are like little storage systems

  • Memory buffer
    • Buffers hold index of all locations
    • Buffers incoming writes
    • Buffer resiliency
      • Power capacitors maintain power to the buffer in the event of system power failure
      • Contents are then written to the persistent store if power fails
  • Pages
    • Cells are addressed by pages
    • 73GB and 200GB use 4KB pages
    • 400GB use 16KB pages
    • Page contents are contiguous address space, like SP cache pages
    • Two 2KB IO in a 4KB flash page but must be contiguous WRT LBA
  • Blocks
    • NAND storage is mapped like a filesystem
    • Pages are grouped together into blocks
      • Not to be confused with SCSI or filesystem blocks
      • Multiple page sin a block jumpled together
      • Addresses of pages in a block do not have to be contiguous
    • Writes to NAND are done at block level
    • Block images are held in buffer until the block is full, then written to previously erased block on disk
    • There must be an erased block available for the write
  • Channels
    • Paths to physical devices (chips)
    • Flash drives have multiple channels, discrete devices can be read from or written to simultaneously
    • Large I/O is striped across the channels

Page States

  • Flash as Mapped Device
    • Workload can affect page state
    • Page state can affect availability of blocks
    • Availability of free (erased) blocks determines write performance
  • Valid state: contains good data (referenced by host and flash)
  • Invalid state: contains stale data
  • Erased state: block is not in use
  • Pages become randomized due to random writes
  • Valid or invalid if referenced by flash meta data
  • For example
    • A file that occupied two blocks on the chip gets written to
    • The first block gets written to the buffer and the block in the NAND gets marks as invalid

Reserve Capacity

  • Some percentage of capacity is reserved and not included as user addressable capacity
  • The capacity will be used to provide ready blocks for incoming writes
  • Sustained heavy writes can saturate a Flash drive
  • Now the drive will need to perform erase operations in idle cycles

Erasing Blocks

  • The drive will erase blocks during idle periods
  • To be erased, a block must have all invalid pages
    • Every valid page in a block must first be written to another block
    • That requires additional activity
      • Read in pages to buffered block
      • Erased old locales in NAND
      • Write out consolidated block to NAND
      • Basically this is defragmentation (housekeeping)

Consolidation

  • Do flash drives slow down over time?
  • Free space is a factor but so is time because it gets more and more fragmented over time
  • Total capacity utilization can affect the response time of sustained writes
    • Higher capacity utilization results in more valid pages in each block
    • Over time, distribution of valid pages becomes more random and capacity utilization increases
    • If blocks have a high percentage of valid pages, it is more difficult to consolidate and erase a block
    • The drive therefore needs more time to do housekeeping

Issues

  • >20% random write workload can have pretty significant affect on flash drive importance

Backfill

  • Small writes and backfill, aka write amplification
  • Writing an I/O smaller than the page requires read-modify-write
  • This therefore doubles the workload on the drive
  • This makes 73GB and 200GB flash drives better as they use 4KB page sizes and don’t suffer from this penalty as much as 400GB drives do with 16KB page sizes

Flash and Write Cache

  • Original guidance: flash does not need SP cache
  • New guidance: flash can help SP cache in many cases
  • Experience: many uses of flash + SP cache in the field
  • OK to use the SP cache for flash drives now and is a benefit in many cases

Best Practices

  • Best use
    • High random read rates
    • Smaller I/O
    • I/O patterns that are not optimal for cached FC implementations
  • Databases: 4-15 flash drives typical
    • Indexes and busy tables
      • Biggest disk-for-disk increase in ready-heavy tables (10-20x)
    • Temp space
      • But turn on SP write cache because of the write/re-read/write nature of temp databases
    • Some clients using Flash for write-heavy loads
      • Use SP cache for better response time
      • Flash flushes cache faster
  • Really big databases are a little different
    • Up to 30 flash drives
    • These bypass SP write cache to maximize write throughput
  • Oracle ASM 11gR2
    • Users can differentiate groups as FAST, AVERAGE, SLOW
  • Messaging (Exchange, Notes)
    • Database to flash and all users benefit
    • Use R5 for Exchange on flash
      • Turn on SP write cache
      • Writes flush to R5 on flash faster than R10 on FC
      • Reads are likely better distributed than from R10 on flash
      • Flash rebuilds faster than FC and impact is less
  • Ok to use
    • Databases
      • Oracle Flash Recovery: SATA do fine here, more economical
      • Redo logs: FC is sufficient, cost less
      • Archive logs: FC even SATA do fine
    • Media
      • Editing configurations are the best fit for flash in media
      • Some advantage to multi-stream access
      • FC will give more predictable write performance at a micro level due to flash’s internal structure
    • Any time power/cooling is an issue

 

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s