Need some help understanding results - Exchange 2010 Disk Issues

Jul 8, 2011 at 4:27 PM
Edited Jul 8, 2011 at 4:32 PM

We are not currently having any reported performance problems, but as part of some routine checking I thought I would run the PAL tool against our Exchange 2010 environment.  Some of the results concern me and I am having trouble finding some good explantions online.  Everything points to technical documentation on SAN configruation, which I dont have access to.  I am hoping I can get some real world answers to some of our results.

Here is an overview of the enviornment

Exchange 2010 Running all roles on vSphere.  Storage is an HP XP-24000 with near 60 disks.  The storage for Exchange is a 1.5 TB Virtual Disk out of which we have carved out 3 Disks and have them mounted using Mount Points on the D Drive.  DB1, DB2, and DB4.  DB1 is the most active.  Database and logs are on the same drive(mount point).  This is also what we are considering as the source of our problem.  Logs and Database should probably be split up.

 

Here are some the results I am having trouble interpeting

Greater than or equal to 64K I/O Sizes - the larger the I/O size, the longer the response times.   What does this mean?

Greater than 20ms logical disk READ response times.  I am guessing this is related to the logs/db on one drive.

Greater than 20ms logical disk WRITE response times.  I am guessing this is related to the logs/db on one drive.

 

As you can see in a subset of the issues, that we are havnig disk issues.  I think the .NET exceptions and other issues, might be related to the disk problems. 

Condition Counter Min Avg Max Hourly Trend
  Greater than 20ms logical disk READ response times \\EXCH02\LogicalDisk(D:\DB1)\Avg. Disk sec/Read 0 .006 .039 0
  Greater than 20ms logical disk READ response times \\EXCH02\LogicalDisk(D:\DB2)\Avg. Disk sec/Read 0 .001 .023 0
  Greater than 20ms logical disk WRITE response times \\EXCH02\LogicalDisk(D:\DB1)\Avg. Disk sec/Write 0 .001 .045 0
  Greater than 20ms logical disk WRITE response times \\EXCH02\LogicalDisk(D:\DB2)\Avg. Disk sec/Write 0 .001 .045 0
  Greater than 20ms logical disk WRITE response times \\EXCH02\LogicalDisk(D:)\Avg. Disk sec/Write 0 .001 .037 0
  Greater than 20ms logical disk WRITE response times \\EXCH02\LogicalDisk(C:)\Avg. Disk sec/Write 0 .001 .039 0
  Average LDAP Search times - greater than 50ms \\EXCH02\MSExchange ADAccess Processes(w3wp EWS _Total)\LDAP Search Time 0 4 60 0
  Average LDAP Search times - greater than 50ms \\EXCH02\MSExchange ADAccess Processes(w3wp EWS 4672)\LDAP Search Time 0 4 60 0
  More than 50% Processor Utilization \\EXCH02\Processor(0)\% Processor Time 6 13 52 -1
  More than 2 ready threads are queued for each processor \\EXCH02\System\Processor Queue Length 0 2 26 0
  Possible Memory Leak: More than 250MBs between overall Min and overall Max and an increasing trend of more than 10MBs per hour \\EXCH02\Process(_Total)\Private Bytes 25,947,619,328 26,177,978,538 26,532,380,672 18,698,026
  Greater than 15ms physical disk READ response times \\EXCH02\PhysicalDisk(3)\Avg. Disk sec/Read 0 .001 .023 0
  Greater than 25ms physical disk READ response times \\EXCH02\PhysicalDisk(2)\Avg. Disk sec/Read 0 .006 .039 0
  Greater than 25ms physical disk WRITE response times \\EXCH02\PhysicalDisk(1 D:)\Avg. Disk sec/Write 0 .001 .037 0
  Greater than 25ms physical disk WRITE response times \\EXCH02\PhysicalDisk(0 C:)\Avg. Disk sec/Write 0 .001 .039 0
  Greater than 25ms physical disk WRITE response times \\EXCH02\PhysicalDisk(3)\Avg. Disk sec/Write 0 .001 .045 0
  Greater than 25ms physical disk WRITE response times \\EXCH02\PhysicalDisk(2)\Avg. Disk sec/Write 0 .001 .045 0
  Greater than or equal to 64K I/O Sizes - the larger the I/O size, the longer the response times \\EXCH02\LogicalDisk(D:\DB1)\Avg. Disk Bytes/Read 0 213,641 322,324 1,076
  Greater than or equal to 64K I/O Sizes - the larger the I/O size, the longer the response times \\EXCH02\LogicalDisk(D:\DB2)\Avg. Disk Bytes/Read 0 52,645 299,008 328
  Greater than or equal to 64K I/O Sizes - the larger the I/O size, the longer the response times \\EXCH02\LogicalDisk(D:)\Avg. Disk Bytes/Read 0 2,084 114,688 138
  Greater than or equal to 64K I/O Sizes - the larger the I/O size, the longer the response times \\EXCH02\LogicalDisk(D:\DB1)\Avg. Disk Bytes/Write 4,268 49,035 173,291 2,263
  Greater than or equal to 64K I/O Sizes - the larger the I/O size, the longer the response times \\EXCH02\LogicalDisk(D:)\Avg. Disk Bytes/Write 3,289 45,344 181,002 -263
  Greater than or equal to 64K I/O Sizes - the larger the I/O size, the longer the response times \\EXCH02\LogicalDisk(C:)\Avg. Disk Bytes/Write 4,717 14,661 270,640 -212
  This process is using more than 1000 data I/O's (network or disk) per second \\EXCH02\Process(_Total)\IO Data Operations/sec 143 461 8,856 -8
  This process is using more than 1000 data I/O's (network or disk) per second \\EXCH02\Process(EdgeTransport)\IO Data Operations/sec 2 105 1,809 -1
  This process is using more than 1000 data I/O's (network or disk) per second \\EXCH02\Process(SMEX_Master)\IO Data Operations/sec 0 67 6,724 6
  This process is using more than 1000 data I/O's (network or disk) per second \\EXCH02\Process(_Total)\IO Other Operations/sec 29 672 1,806 -33
  More than 10 .NET CLR Exceptions Thrown / sec \\EXCH02\.NET CLR Exceptions(EdgeTransport)\# of Exceps Thrown / sec 1 7 25 0
  More than 50 .NET CLR Exceptions Thrown / sec \\EXCH02\.NET CLR Exceptions(msftefd)\# of Exceps Thrown / sec 0 1 151 0
  More than 10 .NET CLR Exceptions Thrown / sec \\EXCH02\.NET CLR Exceptions(_Global_)\# of Exceps Thrown / sec 3 11 28 -1
  More than 10 .NET CLR Exceptions Thrown / sec \\EXCH02\.NET CLR Exceptions(Microsoft.Exchange.RpcClientAccess.Service)\# of Exceps Thrown / sec 0 0 11 0
Dec 1, 2011 at 1:15 PM

Hi,

We are seeing the same sort of errors on a series of file servers.  Did you get any resolution to this?

Thanks.

Dec 1, 2011 at 2:18 PM

I didnt.  We are not having complaints, so I let it go to tackle more pressing issues.

Coordinator
Dec 14, 2011 at 1:58 AM

Sorry for the delay in my response.

We expect 7200 RPM disk drives to respond at 17 ms or faster because most manufacturers guarantee it. When your I/O request packets are gettting an average above what the manufacturer shows as the worse case scenario, then it could mean that the drive is falling behind. With that said, the 17 ms threshold is based on a average I/O size of 64 KB or smaller. When the I/O sizes are large (greater than 64 KB), then the response times will likely take longer. This is where a human like you must correlate the large I/O sizes with longer response times. I recently wrote an article for MCP Magazine on this subject at: http://mcpmag.com/articles/2011/05/12/how-to-speak-san-ish.aspx

Dec 14, 2011 at 8:21 PM

Hi ClintH,  Thanks for responding!

I did read your article, and went back again now to compare it to my results.  While there are spikes in the numbers for my SAN disks, the data for access to my system drive (the C: drive) is the worst performer showing average Disk sec/Read and Disk sec/Write to be around 1.5 (with spikes min .5 to max 2.5).  Avg. Disk Bytes/Read and Avg. Disk Bytes/Write is under 20,000 with a few spikes to 40,000.

We are working to determine what is driving this.