11 June 2017

Easy storage benchmark script based on Diskspd


Most of the projects I do for work have some part of storage and virtualisation in them. In order to get a good feeling for what a certain storage platform can deliver, I try to run at least one benchmark. In the past, it has been a pain to get the benchmarks right to be able to compare the results. I sometimes forget which tool I used last. The storage is not always easily accessible to the tool and some tools  end up overloading the CPU.

Diskspd to the rescue

I've been following the development of Diskspd (https://github.com/Microsoft/diskspd) with interest ever since I saw a demo in some Storage Space talk on some Microsoft conference where it was described as an internal loadtest tool meant to replace SQLIO. Diskspd is easy to use, gives consistent results and is customisable for the type of workload you're trying to mimic.
It's commandline based so it runs on almost any version of Windows (even Hyper-V server and Nano server). Being CLI based means it's easy to script and others have built great scripts to run a benchmark based on all sorts of settings.

Putting it all together

The blog post by Jose Baretto (https://blogs.technet.microsoft.com/josebda/2015/07/03/drive-performance-report-generator-powershell-script-using-diskspd-by-arnaud-torres/) really inspired me to try a more diverse approach to benchmarking with lots  of different settings in order to generate a "fingerprint" of sorts for any given storage system. This script is not meant to give an in depth view of a storage system's performance for your particular workload but will make it possible to compare different systems and their strong/weak points with a single worker and limited differentiating workloads.
In short: a great way to get a ballpark figure for a storage system.

Description of the script

The script works by asking for a few parameters:
- location to store the test file
- size of the test file (at least a few times the cache size)
- duration of each iteration (I use 60 seconds for a standard test run)

A number of parameters for the iterations are hardcoded into the script
- threadcount = the number of cores that are available to the VM/host where the benchmark is running
- queue depth = we will run all tests with a queue depth of 1, 8, 16 and 32 outstanding IOs
- blocksize = we will run all tests with a blocksize of 4k, 8k, 64k and 512k
- read/write ratio = we will run all tests with a read/write ratio of 100/0, 70/30 and 0/100
- random/sequential = we will run all tests with both random and sequential IO
- repeat = to make sure the test iterations are somewhat representative, we will run four iterations with the same parameters in a row

As you can see, this list adds up to quite a number of iterations: 384 of them. As each iteration needs 60 seconds to run, this takes a lot of time so it's not something you run during your lunchbreak. 

The last part of the script handles some formatting to get all the relevant numbers on one line (so it's easy to store as a CSV file later on) and outputs to console and file.

Related automated benchmark: VMFleet

The other tool that Microsoft released on the same Github page is VMFleet. This script launches a number of VMs and kicks off a DiskSPD worker in them. Since most hyperconverged or active-active storage solutions are able to handle multiple IO streams at once, this is a great way to (synthetically) loadtest a storage system that can handle a large number of simultaneous workloads. 

The code itself
# Drive performance Report Generator
# Original by Arnaud TORRES, Edited by Hans Lenze Kaper on 25 - sep - 2015
# Clear screen
write-host "DRIVE PERFORMANCE REPORT GENERATOR" -foregroundcolor green
write-host "Script will stress your computer CPU and storage layer (including network if applicable!), be sure that no critical workload is running" -foregroundcolor yellow
# Disk to test
$Disk = Read-Host 'Which path would you like to test? (example - C:\ClusterStorage\Volume1 or \\fileserver\share or S:) Without the trailing \'
# Reset test counter
$counter = 0
# Use 1 thread / core
$Thread = "-t"+(Get-WmiObject win32_processor).NumberofCores
# Set time in seconds for each run
# 10-120s is fine
$TimeInput = Read-Host 'Duration: How long should each run take in seconds? (example - 60)'
$Time = "-d"+$TimeInput

# Choose how big the benchmark file should be. Make sure it is at least two times the size of the available cache. 
$capacity = Read-Host 'Testfile size: How big should the benchmark file be in GigaBytes? At least two times the cache size (example - 100)'
$CapacityParameter = "-c"+$Capacity+"G"
# Get date for the output file
$date = get-date
# Add the tested disk and the date in the output file
"Command used for the runs .\diskspd.exe -c[testfileSize]G -d[duration] -[randomOrSequential] -w[%write] -t[NumberOfThreads] -o[queue] -b[blocksize] -h -L $Disk\DiskStress\testfile.dat, $date" >> ./output.txt
# Add the headers to the output file
"Test N#, Drive, Operation, Access, Blocks, QueueDepth, Run N#, IOPS, MB/sec, Latency ms, CPU %" >> ./output.txt
# Number of tests
# Multiply the number of loops to change this value
# By default there are : (4 queue depths) x (4 blocks sizes) X (3 for read 100%, 70/30 and write 100%) X (2 for Sequential and Random) X (4 Runs of each)
$NumberOfTests = 384
write-host "TEST RESULTS (also logged in .\output.txt)" -foregroundcolor yellow
# Begin Tests loops

# We will run the tests with 1, 8, 16 and 32 queue depth
(1,8,16,32) | ForEach-Object {
$queueparameter = ("-o"+$_)
$queue = ("QueueDepth "+$_)

# We will run the tests with 4K, 8K, 64K and 512K block
(4,8,64,512) | ForEach-Object {  
$BlockParameter = ("-b"+$_+"K")
$Blocks = ("Blocks "+$_+"K")
# We will do Read tests, 70/30 Read/Write and Write tests
  (0,30,100) | ForEach-Object {
      if ($_ -eq 0){$IO = "Read"}
      if ($_ -eq 30){$IO = "Mixed"}
      if ($_ -eq 100){$IO = "Write"}
      $WriteParameter = "-w"+$_
# We will do random and sequential IO tests
  ("r","si") | ForEach-Object {
      if ($_ -eq "r"){$type = "Random"}
      if ($_ -eq "si"){$type = "Sequential"}
      $AccessParameter = "-"+$_
# Each run will be done 4 times for consistency
  (1..4) | ForEach-Object {
      # The test itself (finally !!)
         $result = .\diskspd.exe $CapacityPArameter $Time $AccessParameter $WriteParameter $Thread $queueparameter $BlockParameter -h -L $Disk\TestDiskSpd\testfile.dat
      # Now we will break the very verbose output of DiskSpd in a single line with the most important values
      foreach ($line in $result) {if ($line -like "total:*") { $total=$line; break } }
      foreach ($line in $result) {if ($line -like "avg.*") { $avg=$line; break } }
      $mbps = $total.Split("|")[2].Trim() 
      $iops = $total.Split("|")[3].Trim()
      $latency = $total.Split("|")[4].Trim()
      $cpu = $avg.Split("|")[1].Trim()
      $counter = $counter + 1
      # A progress bar, for fun
      Write-Progress -Activity ".\diskspd.exe $CapacityPArameter $Time $AccessParameter $WriteParameter $Thread $queueparameter $BlockParameter -h -L $Disk\TestDiskSpd\testfile.dat" -status "Test in progress" -percentComplete ($counter / $NumberofTests * 100)
      # We output the values to the text file
      Test $Counter,$Disk,$IO,$type,$Blocks,$queue,Run $_,$iops,$mbps,$latency,$cpu"  >> ./output.txt
      # We output a verbose format on screen
      “Test $Counter, $Disk, $IO, $type, $Blocks, $queue, Run $_, $iops iops, $mbps MB/sec, $latency ms, $cpu CPU"