Hi Cliff and CodePlex community,
We have some customers experiencing application performance issue in production and we have been trying to do a deep-dive to see what we can do to improve performance. We go and do analysis on logs, use SQL profiler for query executions, PerfMon to collect
logs, DB layer maintenance such as reindexing DBs, making sure to purge records...etc. Now, the tool that kind of gives us what we need is the PerfMon and we run PAL to generate reports based on the log files.
We see that the
Physical Disk Read Latency generated alerts with Avg: 0.084 and Max: 3.213 Disk sec\Read
Physical Disk Write Latency generated alerts with Avg: 1.458 and Max: 17.895 Disk sec\Write
Physical Disk Current Disk Queue Length generated alerts with: Avg: 22 and Max: 255
Physical Disk Avg. Disk Queue Length: Avg 398 and Max: 14,464
Processor % Processor Time:
Min-Avg-Max per processor core
0\ - 3 - 23 - 100
1\ - 0 - 19 - 100
2\ - 1 - 24 - 100
3\ - 0 - 17 - 100
4\ - 0 - 19 - 100
5\ - 0 - 46 - 100
6\ - 0 - 19 - 100
7\ - 1 - 22 - 100
No memory contention and basically DISK and CPU generates alerts.
A little bit of background on the application and server environment:
- It is a PMS application in VB 6.0 with SQL Server as the DBM.
- There are also some interface apps written in VB 6 which talk do variety of tasks such as COM port communications to talk to call accounting, PBX systems, WebService calls\polls for XML message communications over the network...etc. All of these interfaces
also poll for new messages to process in the database.
- Server environment: Windows Server 2003 Standard Edition SP 2
- RAM: 4.00 GB
- CPU: Intel(R) Xeon(R) CPU E5335 @ 2.00 GHz
- Virtual Memory: 6141 M
- Server hosts apps and interfaces for 2 properties. This means up to 6-8 users can be logged on to this server running the main PMS application via Remote Dekstop from their Thin-Client Wyse terminals.
- All the apps and interfaces are installed and run on the server itself.
- DBs (SQL Server 2000 Standard Edition 8.00.2305) are also installed on the server.
- Since there are 2 properties, there are 2 SQL Server instances.
- Each SQL instance houses 7 databases with size ranging from 10MB up to 2GB.
The server pretty much hosts and runs the SQL Server, the apps and interfaces. The only other major thing that visibly runs is McAfee AV. Everything else is pretty much Server related services.
At the time of writing this, there are 5 users logged into the server and Task Manager shows:
2.78 GB RAM used
CPU spiking anywhere from 7%-38%
98 processes running with 31638 handles and 1465 threads.
SQL Instance#1 - 849MB memory (not dynamic. Configured at 849)
SQL Instance#2 - 581MB memory (not dynamic. Configured at 581)
Some of the interfaces that are run on the server:
CC dialer interface#1 - I/O Reads: 1,173,447,256 | Write: 5,936
CC dialer interface#2 - I/O Reads: 1,106,077,515 | Write: 4,543
Between the 2 properties they are running 20 apps and interfaces combined.
- 5 PMS instances
- 6 interfaces for call accounting, PBX, Voice Mail that communicates with 3rd party systems over COM Port and polls for messages from the PMS side every 1-5 mins.
- 9 interfaces that communicate over internet for reservation delivery via webservices or file upload to a central datawarehouse...etc These also poll for new messages every 1-5 mins.
There are 8 SQL Jobs that run at different times throughout the day to complete tasks such as DB backup, reindex, DBCC CHECKDB, certain application-related jobs (query intensive), exporting data to other servers...etc.
Since the apps and the interfaces are all DB-intensive it is presumably understandable that there would be lots of memory accesses and disk reads\writes. Now, the fact that memory is not being used much but lots of disk reads\writes might an
orange flag to indicate a warning here. Logically, if we give more memories to SQL Server then the SQL queries won't have to access disk for data much, right? (Unless the cache is flushed... I know this is a very generic question that's hard to answer without
technical details and variables).
Back to the main point...
So we collect PerfMon logs, generate PAL reports and see high Disk read\write.
What do we do after that? We know the apps are DB intensive. So there will be disk reads\writes. When they exceed the standard thresholds do we look at it from the standpoint of: "Ok, could the disk be performing at an inefficient level?" or do
we look at it from the standpoint of: "How can be make the application so that it is less disk IO intensive?".
We have the same application distributed to thousands of other clients on different platforms and "performance" is not the common denominator. Majority of our clients do not experience application slowness. So, that's why are concentrating
on this specific incidence.
Again, the question is, we see high disk read\write. What is the next step after that?
Any advise, suggestions, recommendations, sharing experience would greatly be appreciated. I might have left lots of helpful information out for you to be able to give me advise, and if so - please let me know what else i need to provide.
I haven't provided much details on the thin-client Wyse terminals that are used at the properties. But we also have same chain customers that use the same terminals and do not experience slowness issues.
Finding a solution to this would help us a great deal and solve an issue we have been battling for 2 years.