Best Practices for Symmetrix Configuration

Considerations

  • Configure enough resources for your workload
  • Use resources evenly for best overall performance
    • Spread across all available components
    • Includes FE, BE and disks
    • Path management can help FE
    • FAST/Optimizer can help BE

Commonly asked questions

  • What size system do I need?
    • Each resource has a limit of I/Os per second and MBs per second
      • Disks
      • Back-end controllers (DAs)
      • Front-end controllers (Fibre, FICON, GigE)
      • SRDF controllers
      • Slices (CPU complexes)
    • Configure enough components to support workload peaks
    • Use those resources as uniformly as possible
    • CPU utilization
      • As a rule of thumb, a limit of no more than 50-70% utilization is good if response time is critical
      • A higher utilization can be tolerated if only IOPS or total throughput matters
    • Memory considerations
      • Ideal to have same size memory boards and same memory between engines
      • Imbalance will make little or no difference with OLTP type workloads
      • Imbalance will create more accesses to boards or engines with large amount of memory, creating a skewed distribution over the hardware resources
    • Front-end connections
      • Go wide before you go deep
        • Use all 0 ports on director first and then the 1 ports
        • Spread across directors first, then on same director
        • Two active ports on one FA slice do not generally do more I/Os
      • Ratios (random read hit normalized at 1)
        • Random read hit 1
        • Random read miss 1/2
        • Random Overwrite I/O’s 1/2
        • Random new write 1/4
      • Worst connection for a host with 8 connections
        • All on one director
        • Instead do one connection per director
    • Disks
      • Performance will scale linearly as you add drives
        • You can see up to 510 IOPS per drive when benchmarking at 8KB, but 150 IOPS is a reasonable design number for real world situations
      • Note that with higher IOPS comes higher response times as well as queues will grow
      • Until some back-end director limit is reaches
      • With smaller I/O sizes (<32KB), the limit reaches is the CPU limit
      • With largest I/O sizes (>32KB), we can reach a throughput limit in the plumbing instead
    • Engine Scaling
      • Scales nearly linear, though not quite.
      • From 1 to 8 engines, it’s 6.8 to 7.8x WRT to IOPS (8KB I/O)
      • From 1 to 8 engines, it’s 4.2 to 7.1x WRT to bandwidth (64KB I/O)
      • Scaling from 1 to 8 shows worst numbers. 4 to 8 showed better numbers.
  • What’s the optimum size of a hyper or number per disk?
    • General rule of thumb, fewer larger hypers will give better overall system performance.
      • There is a system overhead to manage a logical volume so it makes sense that more logical volumes could lead to more overhead.
    • Frequently legacy hyper size is carried forward because of migration
    • Virtual Provisioning will make the size of the hyper on the physical disk
      • You can create very large hypers for the TDATs and still present small LUNs to the host
    • There can be a case of having too few hypers per drive
      • Because it could limit concurrency
      • Set a minimum of 4 to 8 hypers
      • Not an issue with large drives or protections other than R1
  • What is the optimum queue depth?
    • Single threaded (or 1 I/O at a time), the I/O rate is simply the inverse of the service time.
      • For a 5.8ms service time your maximum IOPS is 172.
      • Same drive with 128 I/Os queued can get nearly 500 IOPS
    • We need 1-4 I/Os queued to the disk to achieve the maximum throughput with reasonable latencies
      • Lower queue lengths if response time is CRITICAL
    • Higher if total IOPS is more important than response time
    • With VP, the LUN could be spread over 1000s of drives
      • Queue depth of 32 per VP LUN is probably a reasonable start
    • As IOPS go up, response time will exponentially get worse
  • What is the optimum number of members in a meta volume?
    • 255 maximum supported
    • Reasonable sizes for meta member counts are something like 4, 8, 16, 32
    • Even numbers are preferred
      • Powers of 2 fit nicely into back-end configurations
      • Powers of 2 not important for VP thin metas
    • Getting enough I/O into a very large meta can be a problem
      • 32-way R5 7+1 meta volume would need at least 256 I/Os queued to have 1 I/O per physical disk
  • Should I use meta volumes or host-based striping? Or both?
    • Avoid too many levels of striping (plaid)
    • One large meta volume may outperform serveral smaller meta volumes that are grouped in a host stripe
    • In many cases, host-based striping is preferred over meta volumes
      • One reason is because there will be more host-based queues for concurrency that the host can manage before even getting to the array.
    • However, meta volumes can reduce complexity at the host level
    • So it all depends
    • 24-way meta versus 6 host x 4-way meta – average read response time was better with host-based stripe
  • Striped or Concatenated Metas?
    • In most cases, striped meta volumes will give you better performance than concatenated
      • Because they reside on more spindles
      • Some exceptions exist where concatenated may be better
        • If you don’t have enough drives for all the meta members to be on separate drives (wrapping)
        • If you plan to re-stripe many meta volumes again at the host-level
        • If you are making a very large R5/R6 meta and your workload is largely sequential
      • Concatenated meta volumes can be placed on the same RAID group
      • Don’t place striped meta volumes on the same RAID group (wrapped)
    • Virtual Provisioning
      • Back-end is already striped over the virtual provisioning pool so why re-stripe the thin volume (TDEV)
      • May be performance reasons to have a striped meta on VP
      • Device WP “disconnect” between front-end and backend
        • 5874 Q210SR, 5773 future SR fixes this
      • Number of random read requests we can send to a single device
        • Single device can have 8 outstanding reads per slice per device (TDEV on FA slice)
      • Number of outstanding SRDF/S writes per device
        • Single device can have 1 outstanding write per path per device
      • If it is important to be able to expand a meta, choose concatenated
  • What stripe and I/O size should I choose?
    • For most host-based striping, 128KB or 256KB is good
    • May want to consider a smaller stripe size for database logs, 64KB or smaller may be advised by a Symmetrix performance guru
    • I/O sizes about 64KB or 128KB show little to no performance boost (flattens out). 256KB may actually decrease throughput. This is because everything is managed internally at 64KB chunks.
  • Segregation
    • For the most optimal system performance, you should not segregate applications/BCVs/Clones onto separate physical disks/DAs or engines
    • For the most predictable system performance, you should segregate
    • Tiers should share DA resources so that one tier will not consume resources for another tier
  • What disk drive class should I choose?
    • EFD provide the best response time and maximum IOPS of all drives
    • 15k provide 30% faster performance than 10k (random read miss)
    • 15k provide 56% faster than SATA, 10k provide 39% faster than SATA (random read miss)
    • SATA still does well in sequential read (with single threaded and larger block sizes) (basically good in single stream, bad with multi-thread and therefore disk seeks)
  • What RAID protection should I choose?
    • Performance of reads similar across all protection types (number of drives is what matters)
    • Major difference with random write performance
      • Mirrored: 1 host write = 2 writes
      • R5: 1 host write = 2 reads + 2 writes
      • R6: 1 host write = 3 reads + 3 writes
    • Cost is also a factor
      • R5/R6 are best at 12.5% and 25% protection overhead
      • R1 has 50% protection overhead
  • How much cache do I need?
    • Easiest method is to utilize the Dynamic Cache Partition White If (DCPwi) tool
    • Put like devices together in cache partitions
    • Start analysis mode and collect DCP stats
  • How do I know when I’m getting close to limits?
    • Watch for growth trends in your workload with SPA
    • Look out for increasing response time (host-based tools like iostat, sar, RMF)
    • Monitor utilization metrics in WLA/STP
    • Better to be pro-active than waiting to hit th ewall
    • Any utilizations well over 50% should be considered a possible source of future issues with growth
Advertisements

One thought on “Best Practices for Symmetrix Configuration

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s