Most performance concerns can be summarized by 4 questions:
- What am I getting?
- Is that what I should be getting?
- If not, why not?
- What, if anything, can I do about it?
Characterize workload in terms of
- Size (KB/IO)
- Direction (read/write)
Performance triage domains
- IP network
- Data mover
- Fibre channel
- Storage processors
- More fibre channel
- Disk drives
Celerra Volume Stack
Basic volume (dvols)
Identify the protocol(s) in use
server_stats server_5 -summary cifs,nfs -interval 10 -count 6
server_stats server_5 -summary nfs -interval 10 -count 6
server_stats server_5 -table nfs -interval 10 -count 6
- Operations will break down between v3Write, v3Create, etc.
server_stats server_5 -table fsvol -interval 10 -count 6
- Correlates the filesystem with the meta-volumes
- The percentage contribution of write requests for each meta-volume is shown (“FS Write Reqs %”)
server_stats server_5 -table dvol -interval 10 -count 6
- Shows the write distribution across all volumes
- AVM will work hard to prevent disk overlap for a filesystem
- Slice your stripes, don’t stripe your slices (basically create the stripe across all volumes first, then slice those up as needed)
- root_ldisk – log disk, high activity on this disk will mean lots of log activity in the server_log. The ufslog hit high threshold. But is it a problem?
- Data mover memory includes inodes and data blocks
- Data mover cache is write-through, meaning that data needs to be destaged from cache before it will acknowledge to the host. This is because the cache is not protected from power loss.
- When writes are coming in, data blocks are updated, and inodes need to be updated.
- Inode updates are writing to the ufslog staging buffer.
- The staging buffer contains uxfs log transactions and then destages to disk.
- Ufslog hit high threshold means in in-memory copies of uxfs log transactions which have already been written to disk could not be retired because the dirty meta data to which they point has not yet been flushed to the filesystem metavolume
- This message indicates contention at the filesystem metavolume, not the ufslog volume.
- If ufslog is an issue, the error message will be “staging buffer full, using next one”. One periodically is not an issue. It’s actually good that the buffer is being use. Only if you get a lot of these per second.
nas_disk -l | grep root_ldisk
navicli -h spa getlun 1 -rwr -brw -wch (read write rate, blocks read/written, write cache hit)
For IOPS, 8 threads of I/O will yield the greatest increases. 8 to 64 threads yields nominal improvement.
nas_fs -I fs1
nas_disk -I d38
- Look at stor_dev (hex) and convert to decimal
navicli -h spa getlun 27 -rwr -brw -wch
- Blocks written / write requests = blocks per write (multiply by 512 bytes to get block size)
navicli -h spa getlun 27 -rwr -brw -wch -disk
- Shows the disks associated with the lun
navicli -h spa getdisk 2_0_2 -rds -wrts -bytrd -bytwrt (read reqs, write reqs, kbyte read, kbytes written)
- Kbytes written / write requests
Putting it all together, we saw:
Nfs write size 8KB
Dvol write size 8KB
Lun write size 8KB
Disk write size 32KB
Go to the host, and check the filesystem:
grep fs1 /etc/mtab
Check the rsize=8192,wsize=8192 settings. These are the buffer size limits. Even if the application wanted to write 32KB, the buffer is limited to 8KB. Update those settings to 32768. You’d need to unmount and remount with the new settings. Needs to be coordinated since it will be disruptive.
server_ifconfig server_5 -all
server_stats server_5 -table net -interval 10 -count 6
- Network in (KiB/s) / Network In (Pkts/s) to figure out the packet size
- Do this for In and Out to see what the standard MTU size is
server_netstat server_2 -s -p tcp
- Look for transmission errors (retranmissions)
- A node is aware only of its own retransmissions so be sure to check both ends of the connection
Navisphere Analyzer Command Line Interface is a good reference for looking at data.
You can extract specifically what you want as a CSV file.
naviseccli analyzer -archivedump -data spa.nar -stime “…” -ftime “…” -object l -format pt,on,rio,rs,wio,ws | grep _d38