Clariion: Performance and Design Implications of Mixed Workloads

RBA – Ring Buffer Analyzer, internal analysis tool

TCD – target command device (front-end ports of the array)

LBA distribution – locality of reference

Battle of the RAID Group

Default segment/pre-fetch multiplier is 4.

LUN migration should be run with 511 write-aside on the target to avoid cache overrun. 40MB/s will run through the SP, and SP utilization will be moderate. This restriction is lifted in Release 29. It will default LUN migration to write-aside.

The duration of read starvation depends on the amount of cache to de-stage to the low watermark and the state of the LRU queue.

In the larger CX4 arrays with more write cache, you may want to change the high and low watermarks to reduce the effect of read starvation. (60-65%)

This read starvation could have a significant effect on Jetstress testing.

The default watermarks (60-80%) are not necessarily needed. There is nothing wrong with low and close watermarks. The most important cache consideration is being able to have write cache available for a large bursty write than de-staging cache more frequently. However, if you’re getting high hit rates, then you may want to keep more data in cache.

Idle delay timer (2 seconds) will kick off a de-stage activity. Avoid “lazy writers” by setting the to 200 (20 seconds) with:

naviseccli -chglun -l <lun> -t 200.

Release 29 will default to 20 seconds.

  1. Write to a lun, written to cache
  2. The “clean-dirty” bit must be set dirty on the drive to flag presence of vault content after an SP failure
  3. After the state bit is set, the write can be ack’ed.
  4. After 2 seconds, idle flushers clear this LUN’s cache entries
  5. Clean-dirty is set to clean
  6. A new write comes in. We need to set “dirty” again
  7. If the disk is really busy, we will be very slow returning the ack.

This activity is seen more commonly now because of VMware and clusters. You can see this if you run a perfmon.

Battle of the SP

Not supposed to mix OLTP and DSS because they have completely different I/O patterns. However, on larger CX4 arrays, you can dedicate buses to different patterns.

Rules of Thumb

Multiplier (CPUM)

CX4-960 – 1.00

CX4-480 – 0.65

CX4-240 – 0.55

CX4-120 – 0.30

A – CPUM x 50k reads/s standard lun

B – CPUM x 16k write/s R5

C – CPUM x 20k writes/s R10

D – CPUM x 40k reads/s, Snaps, MV/s, clone source

E – CPUM x 7.5k writes/s MV/s

F – CPUM x 6k writes/s, clone-in-sync

G – CPUM x 2.5k writes/s, Snap COFW

H- CPUM x 6k writes/s, Snap non-COFW

Data logging % = Number of LUNs / Max LUNs * 10%

One SP’s utilization will be the sum of the proportional contributions of each I/O type

The memory subsystem is doing 3000 MB/s for 600 MB/s.

Cycles of execution are tied not only to the I/O but also memory subsystem operations.

  • Host IO (600 MB/s)
  • De-stage to disk (600 MB/s)
  • CPU xor (600 MB/s)
  • Sending data to peer SP (600 MB/s)
  • Receive data from peer SP (600 MB/s)

So now for 600 MB/s of activity, we see 3000 MB/s.

Note in NDU failover, polling and data logging do not failover.

Battle of Large Block and Small Block

Large block I/O on the disk can be very disruptive to small block I/O.

8KB random read at 5.5ms for 180 IOPS

512KB random read at 15ms for 66 IOPS

In a single thread to the same drive, they will both get 1000/(5.5+15) = 48.78. Now both activities are getting 48 IOPS. Big decrease for the small block I/O.

You know you have this mixture if you have high write response times but the dirty cache pages are not at 100%.

Check your HBA queue depths settings and path distribution.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s