Feature request: Statistical analysis

Sep 18, 2013 at 1:09 PM
Edited Sep 18, 2013 at 1:17 PM
Clint, I hope I am not asking too much but I would like to request that in addition to the statistical analysis you already have (ie. Min, Average, Max, Standard Deviation, Outliers) could we also have Median (or 50th percentile) and quite possibly 90th, 95th, and 99th percentile?

I find knowing these numbers tells me a whole lot about my storage in particular, but would also be beneficial on other indicators as well. Averaging isn't very useful on some of the performance indicators (such as Logical Disk and Physical Disk) as a few really high values can really throw off the average. The Median (or 50th percentile) would tell me more about what a particular indicator is doing during any given interval.
Coordinator
Sep 20, 2013 at 9:42 PM
I'm open to it. Can you refresh me on how to calculate Median?
I originally thought that removing the outlying data points was consider an x% percentile, but I was told that was wrong, so I just call them x% of outliers removed. The function CalculatePercentile in PAL.ps1 does these calculations, so you are welcome to modify the code yourself to try it out. This is a great way to help with this project which is fueled completely through volunteer help.
Oct 7, 2013 at 11:25 PM

Sorry for taking so long to respond. I work full time and go to college full time and just got sidetracked.

The reason why I ask to have the Median calculated along with the Mean (or Average) is that it tells me a whole lot about what a particular process is doing within the time interval.

To calculate the Median value, you order all the values from smallest to largest and pick the value dead center. (See attached GetMedian.ps1 file).

For example 1, 1, 2, 2, 3, 3, 3, 3, 4, 448.

The value that is at the half-way point is between the values at position five and position six. Basically (3+3)/2. If there were 9 values here, we would just settle for the mid-point value.

Let’s take an Avg. Disk sec/Read as an example. If we were to take the ten values listed above, we would end up with a Mean (average) of 47ms and a Median of 3ms.

As you can see, the Mean value of 47ms is skewed (because of the single high value of 448). A Mean of 47ms doesn’t necessarily mean that the Avg. Disk sec/Read is at 47ms half the time. However, the Median of 3ms does tell me that the Avg. Disk sec/Read is at 3ms or less half of the time.

For a quick starter on Median and Percentiles see this site:

http://devnambi.com/archive/2011/03/statistical-analysis-102-median-percentile-and-covariance/

Also, Median is the same as the 50th percentile. The GetMedian function could be adapted to find and return the top 99th, 95th, and 90th percentiles.

I would love to take a shot at updating the code, but I just don’t have a lot of spare time right now as it is. I have two more semesters before I’m done and then I start on my Masters. If I catch a break in between semesters, I might take a shot at it.

Thanks for your time.

Oct 8, 2013 at 1:05 AM

In case the attachment doesn’t go through, here is a simple code for calculating Median.

function Get-Median ([array]$list){

# sort the list

[array]::sort($list)

# determine if the number of items is even or odd

$odd = 0

[math]::DivRem($list.count,2,[ref]$odd) | out-null

# if odd return the middle item otherwise average the two middle items

if ($odd){

$list[($list.count-1)/2]}

else {([double]$list[$list.count/2] + [double]$list[$list.count/2 - 1])/2

}

Coordinator
Oct 8, 2013 at 5:36 PM
Hi raytrace,

Thanks for explaining Median and posting the code.
Did you write the code yourself or copy it from somewhere? I have permission from the original creator of the code to use it in my project.
Once I have permission to use it, I'll consider putting it in the next release.

Thank you!
Oct 10, 2013 at 3:09 PM

Not a problem.

I picked up the code from PowerGUI posted by seaJHawk.

http://powergui.org/thread.jspa?messageID=25805

Once I’m done with school, I wouldn’t mind trying my hand at contributing some more code for some more in-depth statistical analysis such as percentiles and maybe a few others.

Percentile is pretty easy though. If you have a set of data points, you order them from highest to lowest in an array. If you want the 99th percentile, you remove the top 1% from the array and then return the highest value from what is left in the array. For the 95th percentile, you would remove the top 5% and then return the highest value left in the array. I normally calculate the 99th, 95th, and 90th percentiles along with median, mean, and std dev when I crunch my performance data.

Also, in response to your comment about the “% of Outliers Removed” I found an excellent Youtube video on calculating Outliers.

http://www.youtube.com/watch?v=9aDHbRb4Bf8

Careful though, he says “mean”when he actually means “median”.

Coordinator
Oct 10, 2013 at 7:04 PM
Ah, thanks for the references. I'll see what I can do.