Performance as a Function of Utilization on CLARiiON

Measurements

  • Utilization = 100% * busy time in period / (idle + busy) time in period
  • Throughput = total number of visitors in periods / period in length in seconds
  • Average Busy Queue Length = sum of queue upon arrive of visitor x / total number of visitors
  • Queue length = ABQL * utilization/100%
  • Response time = queue length / throughput (Little’s Law)

For low LUN throughput (<32 IOPS), response time might be inaccurate

  • Response time here is calculated, lazy writes will skew the LUN busy counter
  • RBA actually measures the response time

Dual SP ownership of a disk

  • Can also impact response time
  • Each SP only knows about its own ABQL, throughput and utilization for the disk
  • At poll time, they exchange views. The utilization is max(SPA,SPB)
  • ABQL is computed from the sum of the sum
  • And SP throughput is the sum of SPA and SPB throughput

Be wary of confusing SP response time in Analyzer with the average response time of all LUNs on that SP

  • Response time is calculated and based on utliization
  • A LUN is busy (not resting) as long as something is queued to it
  • An SP is busy (not resting) as long as it is not in the OS idle loop
  • While a disk is busy getting a LUN request, the LUN is still busy
  • While a disk is busy getting a LUN request, the SP might be idle
  • The SP response time is generally smaller than the average response time of all the LUNs on that SP
  • Host response time is approximated by LUN response time

Recall from last year:

  • Rules of Thumb
  • Multiplier (CPUM)
  • CX4-960 – 1.00
  • CX4-480 – 0.65
  • CX4-240 – 0.55
  • CX4-120 – 0.30
  • CX3-80 – 0.50
  • A – CPUM x 50k reads/s standard lun
  • B – CPUM x 16k write/s R5
  • C – CPUM x 20k writes/s R10
  • D – CPUM x 40k reads/s, Snaps, MV/s, clone source
  • E – CPUM x 7.5k writes/s MV/s
  • F – CPUM x 6k writes/s, clone-in-sync
  • G – CPUM x 2.5k writes/s, Snap COFW
  • H- CPUM x 6k writes/s, Snap non-COFW
  • Data logging % = Number of LUNs / Max LUNs * 10%
  • One SP’s utilization will be the sum of the proportional contributions of each I/O type
  • Use 4KB for IOPS and 512KB for Bandwidth
  • I = CPUM x 1500MB/s read
  • J = CPUM x 600MB/s write (cache on)
  • Note: ASAP rebuilt, background verify, mirror syncs count against this number
  • Example: CX4-960, RAID 5, 9000 IOPS, 2:1 R:W, 8KB –> 38% utilization
  • 6000 read IOPs, 3000 write IOPs, 48MB/s read, 23MB/s write, RAID 5, CX4-960
  • 6000/50000 + 3000/16000 + 48/1500 + 24/600 = 12% + 19% + 3.2% + 4.0% = 38.2% SP utilization

His formula is low

  • Configuration polling
    • Pre-FLARE 26.31 configuration polling is another low priority internal function that affects utilization
    • Go to http://ipaddress/setup
    • Set Update Parameters in the Setup Menu and pick 300s. Update Interval to 300s.
    • Performance Interval (for statistics logging) is ok at 60. This does nothing compared to configuration polling and data logging.
    • Also include the -np (no poll) option whenever possible in CLI scripts
  • Data logging
    • 7-10% differential comes from default data logging settings in older FLARE revisions with a lot of LUNs
    • Throughput was still unaffected because Analyzer threads run at a lower priority than I/O threads
    • Navisphere commands could be sluggish because they would be at the same level
    • Fix it by changing from 60/60 or 60/120 to 300/300.
    • Data logging poll rate is the lower of the two.
    • This will signficantly reduce pre-FLARE 29 utilization
  • Navisphere operations, especially without -np (no pool)
  • Background verify, rebuild, LUN migration, zeroing operations
  • Snap, Clone, Mirror, SAN Copy overhead
  • Disk or bus bottlenecks
  • Heavy flushing

His formula is too high

  • Coalesced backend writes
  • Pre-fetch
  • Nature of the load

In FLARE 26.31, FLARE 28, FLARE 29, FLARE 30

  • Delta polling was introduced in FLARE 28 and back-revved to FLARE 26.31
  • Significantly reduces Navisphere overhead
  • FLARE 30, CLI commands without -np are given more processor time
  • FLARE 29, data logging utilization has been reduced 80%
  • FLARE 30 introduces fully provisioning virtual LUNs in pools of storage (thick LUNs)
  • H6099 document
  • NDU now uses % PrivilegedTime not % Processor Time as shown by Analyzer, 65% is safe (instead of 50%).

What will happen with SP utilization in the presence of EMC Flash Cache?

  • 64KB is the base element for analysis for migration into Flash Cache
  • There is a considerable amount of promotions (HDD > EFD) that will cost SP utilization. After the bulk of those initial promotions occur, it will be about 8-10% increased SP utilization for Flash Cache after warmup.
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s