Calculate IOPS

Mar 19, 2013 at 8:57 AM
Hi,

I collected some measurements using Windows Perfmon on an Exchange server 2010 mailbox server regarding disk transfers/sec. The PAL gives me some numbers like average, std. deviation. I would like to know how to calculate the number of IOPS? Should I add the avg. number to the std. deviation?

Thanks a lot
Mar 19, 2013 at 6:47 PM
Good question. Perfmon will report disk transfers per second (IOPS) in realtime. If you've created a data collector set and are logging values to a .csv/.blg at a set interval (10 seconds), the data will not be 100% accurate. You will only be getting IO per second measurements for one out of every 10 seconds. See this for more info: http://blogs.technet.com/b/cotw/archive/2009/03/18/analyzing-storage-performance.aspx. This data is only good for determining statistical values (mean, median, standard deviation, etc.), not for, say, counting every single IOPS against your Exchange database.

When PAL consumes your perfmon data and analyzes it, it returns some of those statistical values - namely minimum, maximum, average and standard deviation. There are also some calculations that remove outlying data. Average number of IOPS is a good start if you are planning for a storage system upgrade, but at Clearpath (EMC Partner of the Year) we usually size storage at the 95th percentile when putting a new SAN configuration together for a customer (then we adjust based on data commonality - lots of the same data being served out of cache instead of from disk; and sequentiality). You can use the mean and std. deviation values that PAL reports to find percentiles. As it turns out, 2 standard deviations above the mean is about the 95th percentile. So the math looks like this (warning - I barely passed my statistics courses in college - math and I don't always get along):

Average: 5000 IOPS
STD Dev: 794

(2 x Std.Dev) + Avg = 95th Percentile
2 x 794) + 5000 = 6588 IOPS@ 95th Percentile

You'll want to pay attention to the percentage of read vs. write IOPS if you are sizing for new storage. Write IOPS typically incur a penalty (writes are more expensive than reads due to multiple RAID writes, parity calculations, etc.). RAID5 has a write penalty of 4. RAID10 has a write penalty of 2. I.E. if your 6588 IOPS in the example above were 100% write, you would multiply your IOPS by 2 = 13176. You would then use 13176 IOPS to decide how many disks to buy. You'll get ~180 IOPS per 15k RPM disks, so 13176/180 = 74 disks in RAID 10 to meet your IOPS requirement (ignoring things like cache). PAL thresholds include Disk Read Transfers per Sec and Disk Write Transfers per Sec....

If you want to learn a bit more about the math, check out my Storage Basics blog posts at: http://vmtoday.com/category/storage/storage-basics/.

Hope this is helpful.
Mar 19, 2013 at 7:01 PM
Good question. Perfmon will report disk transfers per second (IOPS) in realtime. If you've created a data collector set and are logging values to a .csv/.blg at a set interval (10 seconds), the data will not be 100% accurate. You will only be getting IO per second measurements for one out of every 10 seconds. See this for more info: http://blogs.technet.com/b/cotw/archive/2009/03/18/analyzing-storage-performance.aspx. This data is only good for determining statistical values (mean, median, standard deviation, etc.), not for, say, counting every single IOPS against your Exchange database.

When PAL consumes your perfmon data and analyzes it, it returns some of those statistical values - namely minimum, maximum, average and standard deviation. There are also some calculations that remove outlying data. Average number of IOPS is a good start if you are planning for a storage system upgrade, but at Clearpath (EMC Partner of the Year) we usually size storage at the 95th percentile when putting a new SAN configuration together for a customer (then we adjust based on data commonality - lots of the same data being served out of cache instead of from disk; and sequentiality). You can use the mean and std. deviation values that PAL reports to find percentiles. As it turns out, 2 standard deviations above the mean is about the 95th percentile. So the math looks like this (warning - I barely passed my statistics courses in college - math and I don't always get along):

Average: 5000 IOPS
STD Dev: 794

(2 x Std.Dev) + Avg = 95th Percentile
2 x 794) + 5000 = 6588 IOPS@ 95th Percentile

You'll want to pay attention to the percentage of read vs. write IOPS if you are sizing for new storage. Write IOPS typically incur a penalty (writes are more expensive than reads due to multiple RAID writes, parity calculations, etc.). RAID5 has a write penalty of 4. RAID10 has a write penalty of 2. I.E. if your 6588 IOPS in the example above were 100% write, you would multiply your IOPS by 2 = 13176. You would then use 13176 IOPS to decide how many disks to buy. You'll get ~180 IOPS per 15k RPM disks, so 13176/180 = 74 disks in RAID 10 to meet your IOPS requirement (ignoring things like cache). PAL thresholds include Disk Read Transfers per Sec and Disk Write Transfers per Sec....

If you want to learn a bit more about the math, check out my Storage Basics blog posts at: http://vmtoday.com/category/storage/storage-basics/.

Hope this is helpful.
Coordinator
Mar 20, 2013 at 2:32 AM
Actually, the Avg. Disk sec/Read, Avg. Disk sec/Write, Avg. Disk sec/Transfer performance counters do count every IO that occurs between each collection. For example, if you collect one of these counter instances every 10 seconds and if there was 100 IOs between the first collection and the second collection, then the second collection will be the average of all 100 IOs. This is based what Bruce Worthington has told me who examined the source code of the performance counters a few months ago.
Coordinator
Mar 20, 2013 at 2:40 AM
I read the COTW article again and I think there is a misinterpretation of what is it saying. To clarify, we are suggesting to collect the latency disk counters often simply catch short bursts of problems. For example, if the counter instances are collected once every 5 minutes, then there is likely a lot of data points to average. The counter would return only the average of all 5 minutes of data which does include all of the IOs that occurred in that 5 minutes. We won't know what the standard deviation is. So, the counters are 100% accurate based on the perspective of the operating system - meaning that the clock can be skewed by virtualization, but the level of the detail is limited to how often they are collected.
Mar 20, 2013 at 3:24 AM
Great info, Clint. I'm going to have to revise some of stuff.... Will hit the lab as tome permits and try to publish some updates to clear up the confusion that I have helped to spread.
Coordinator
Mar 20, 2013 at 4:33 AM
Hi Joshua,

All of what you said has merit, especially on calculated IOPS. You are going with mathematical results while I am going by what a trusted advisor told me. All in all, "trust", but verify.

In any case, given that the latency counters do track all IOs, the frequency by which they are collected matters which actually plays into what you were saying about gathering often to be accurate. Now, when I said that the disk counters are 100% accurate, I want to stress the point that the "counter" and how it is gathered is accurate, but tools like the System Monitor and Relog.exe can "skew" the data making it less accurate through "massaging" the data - this was recently proven when dealing with more than 1000 data points. When you take all of the factors into consideration, we end up with a less than 100% accuracy going back to your primary message in the first place.

All in all, I'm sorry if I made this a drawn out explanation. You can certainly trust me, but I encourage verification.