CX vs DMX Failure Scenarios

Whenever customers are considering a CX vs DMX, we try our best to provide the customer with an holistic view of the pros and cons of each of the arrays and ultimately to match the price/performance/availability configuration with the customer’s profile. One of the lesser discussed areas of differentiation is what happens in failure scenarios. Below are some comparisons that I’ve found in my research.

Complete Power Loss
– CX: Standby power supplies (SPS, two 2.2/1 kW [CX3-80/CX3-40 or less]) provide backup power for 90 seconds until cache has been destaged. Keep in mind that a CX will have at most 3 GB of write cache to be destaged. That would only require 34 MB/s to destage 3 GB in 90 seconds so that time frame is sufficient.
– DMX: Battery backup modules (BBU, up to eight 2.2 kW per storage bay) provide backup power for up to two 5 minute intervals while global memory is destaged to vault disks. The increased requirement for BBUs is in place as there is a possible 256 GB of usable global memory.

Power Supply Fault
– CX: The CX can have a single power supply/blower module fault in either storage processor (SP) and still maintain high availability (HA). This statement is applicable to CX3-40 or less. However, once you have either two power supplies in a single SP fail or one power supply from each SP fail, write cache will be disabled, and the system will operate in pass-through mode.
– DMX: The DMX is configured with two power zones. If one power zone loses power, the DMX will continue to operate normally for a 20-hour period. Within that period the user has three options: 1) repair the fault, 2) reset the 20-hour timer, 3) allow the system to vault and power down.

SPS/BBU Fault
– CX: As long as there is one fully charged SPS, write cache will continue to operate normally. So in configurations where there are two SPSs, the CX array can tolerate a fault in one SPS before write cache is disabled. All CX3 arrays except the CX3-10 come with two SPSs by default.
– DMX: Two BBU modules are required for four drive enclosures, which represents one row in a storage bay, with up to eight BBU modules per storage bay. No mention of global memory destaging in the event of BBU failures.

Other Power Events
– CX: Low power input will cause cache to be disabled and will put the array in write-through mode until power returns to normal levels. Excess power input will cause no change unless the excess power input causes a power supply fault, which will assume behavior as described above.

Overheating
– CX: If the system exceeds maximum ambient temperature by 10 degrees Celsius (18 degrees Fahrenheit), the array will destage cache and begin an orderly shutdown of the SPs.
– DMX: The Environment Control Module (XCM) will send an “over-temperature condition exists” alert. I’ll need to keep digging to see what other actions other than alerting the DMX will take in this scenario.

Reference Material
– CX700 SPE Hardware Reference
– Introduction to CX3 UltraScale Series: Applied Technology
– EMC Primus ID: emc156197 (How long is write cache disabled in an SPS test?)
– EMC Primus ID: emc153035 (How many power supplies must fail on a CX3-20/40 for write cache to be disabled?)
– DMX3/4 Product Guide

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s